OPEN_SOURCE ↗
REDDIT · REDDIT// 8d agoMODEL RELEASE
Gemma 4 loops in LM Studio
A Reddit user reports Gemma 4-26B-A4B collapsing into recursive junk output in LM Studio on dual MI50s with Vulkan, Q4_K_M, and Q8_0 KV cache. The repeated `</think>` and `<|im_end|>` tokens suggest a template or backend mismatch more than a simple “bad model” complaint.
// ANALYSIS
This looks like an integration bug disguised as a model failure. Gemma 4 is meant to run locally, but if the runtime is feeding it the wrong chat format or stop tokens, the model can spiral into exactly this kind of self-referential loop.
- –The output tokens shown here are from non-Gemma chat schemas, which points to a prompt/template mismatch or incorrect stop-sequence handling.
- –Vulkan plus quantized KV cache plus a MoE model is a brittle stack; any backend edge case can turn into repeated garbage generation.
- –Google positions Gemma 4 as a local-first, agentic open model family, so a failure like this is a support-gap issue that matters for real-world adoption.
- –The first things to try are disabling KV-cache quantization, verifying the Gemma 4 chat template, and testing a different backend or build.
// TAGS
gemma-4llminferencegpuopen-weightsreasoningmultimodal
DISCOVERED
8d ago
2026-04-04
PUBLISHED
8d ago
2026-04-04
RELEVANCE
9/ 10
AUTHOR
Savantskie1