REDDIT · REDDIT// 3h agoINFRASTRUCTURE

Qwen, Gemma quantization loops frustrate local runners

Local users are reporting that Qwen3.6-35B-A3B and Gemma 4 26B A4B can slip into repetitive “thinking” loops, especially at Q3/Q4 quantization. The thread points to a deployment-side issue in long reasoning traces, not a simple sampling mistake.

// ANALYSIS

This looks less like a one-off bug and more like the cost of pushing sparse reasoning models through aggressive quantization in extended think mode. If the pattern holds, local inference stacks will need loop detection and safer defaults, not just more tuning.

–The symptom is a repetition attractor: the model keeps rephrasing intent instead of advancing the reasoning chain.
–Multiple users are seeing it on both Qwen and Gemma, which suggests an architecture plus quantization interaction rather than a single model regression.
–Tweaking temperature or top_p probably will not fix the root cause if the cache or attention state is drifting over long hidden traces.
–Practical mitigations are shorter contexts, tighter thinking budgets, full-precision KV cache, and watchdog logic that aborts on repetition.
–This is a reminder that “works at Q4” is not the same as “stable in agentic reasoning workloads.”

// TAGS

qwen3.6-35b-a3bgemma-4-26b-a4bllmreasoninginferenceself-hostedopen-source

DISCOVERED

3h ago

2026-05-01

PUBLISHED

4h ago

2026-05-01

RELEVANCE

7/ 10

AUTHOR

BitGreen1270