llama.cpp hits Qwen 3.6, Gemma 4 loops
A LocalLLaMA user reports Qwen 3.6 and Gemma 4 both spiraling into runaway terminal “thinking” loops under local serving, with slashes or repeated words until max tokens are exhausted. The thread points more toward an inference/runtime problem than a single bad model.
This looks like a serving-stack failure, not a single broken checkpoint. The repeated runaway pattern across Qwen 3.6 and Gemma 4 points more toward template, cache, or reasoning-mode handling than special characters. Qwen3.6’s thinking-preservation behavior plus llama.cpp’s reasoning-budget plumbing makes the boundary between deliberate reasoning and infinite repetition fragile, and the heavier configuration with dual-GPU offload, quantized KV cache, multimodal projection, and long context likely exposes edge cases. For local-agent operators, the real fix is probably upstream runtime or model-template work rather than prompt tweaks or restart scripts.
DISCOVERED
6h ago
2026-05-01
PUBLISHED
9h ago
2026-04-30
RELEVANCE
AUTHOR
sid351