REDDIT · REDDIT// 6h agoINFRASTRUCTURE

llama.cpp hits Qwen 3.6, Gemma 4 loops

A LocalLLaMA user reports Qwen 3.6 and Gemma 4 both spiraling into runaway terminal “thinking” loops under local serving, with slashes or repeated words until max tokens are exhausted. The thread points more toward an inference/runtime problem than a single bad model.

// ANALYSIS

This looks like a serving-stack failure, not a single broken checkpoint. The repeated runaway pattern across Qwen 3.6 and Gemma 4 points more toward template, cache, or reasoning-mode handling than special characters. Qwen3.6’s thinking-preservation behavior plus llama.cpp’s reasoning-budget plumbing makes the boundary between deliberate reasoning and infinite repetition fragile, and the heavier configuration with dual-GPU offload, quantized KV cache, multimodal projection, and long context likely exposes edge cases. For local-agent operators, the real fix is probably upstream runtime or model-template work rather than prompt tweaks or restart scripts.

// TAGS

llmreasoninginferenceself-hostedopen-sourcellama.cppqwen3.6

DISCOVERED

6h ago

2026-05-01

PUBLISHED

9h ago

2026-04-30

RELEVANCE

7/ 10

AUTHOR

sid351