BACK_TO_FEEDAICRIER_2
Gemma 4 loops in LM Studio
OPEN_SOURCE ↗
REDDIT · REDDIT// 8d agoMODEL RELEASE

Gemma 4 loops in LM Studio

A Reddit user reports Gemma 4-26B-A4B collapsing into recursive junk output in LM Studio on dual MI50s with Vulkan, Q4_K_M, and Q8_0 KV cache. The repeated `</think>` and `<|im_end|>` tokens suggest a template or backend mismatch more than a simple “bad model” complaint.

// ANALYSIS

This looks like an integration bug disguised as a model failure. Gemma 4 is meant to run locally, but if the runtime is feeding it the wrong chat format or stop tokens, the model can spiral into exactly this kind of self-referential loop.

  • The output tokens shown here are from non-Gemma chat schemas, which points to a prompt/template mismatch or incorrect stop-sequence handling.
  • Vulkan plus quantized KV cache plus a MoE model is a brittle stack; any backend edge case can turn into repeated garbage generation.
  • Google positions Gemma 4 as a local-first, agentic open model family, so a failure like this is a support-gap issue that matters for real-world adoption.
  • The first things to try are disabling KV-cache quantization, verifying the Gemma 4 chat template, and testing a different backend or build.
// TAGS
gemma-4llminferencegpuopen-weightsreasoningmultimodal

DISCOVERED

8d ago

2026-04-04

PUBLISHED

8d ago

2026-04-04

RELEVANCE

9/ 10

AUTHOR

Savantskie1