OPEN_SOURCE ↗
REDDIT · REDDIT// 23d agoNEWS
Qwen3.5 Small Peers Hit Repetition Loops
The thread asks why 0.8B-1B open models keep spiraling into repeated phrases at low temperatures while their 3B+ siblings stay stable. The likely culprit is a mix of tiny-model capacity limits and decoding settings, with Qwen’s own docs warning that Qwen3.5-0.8B is more prone to loops.
// ANALYSIS
Yes, this looks like a real small-model tax, not just a bad prompt. Low temperature can lock these models onto a high-probability phrase loop, and once they start echoing themselves, there often isn’t enough capacity left to recover cleanly.
- –Qwen’s official sampling guidance for text runs much hotter than `0.1-0.3` and leans on `presence_penalty`, which is a strong hint that ultra-low temperature is the wrong default for this family.
- –The Qwen3.5-0.8B model card explicitly notes it is more prone to thinking loops, so the behavior is not surprising even by the vendor’s own admission.
- –Meta positions Llama 3.2 1B as lightweight, multilingual, and on-device friendly, but that is a very different promise from “robust long-form generation under aggressive deterministic decoding.”
- –Gemma 3 1B is also framed as a portable model for constrained hardware, so the 1B tier is where you should expect brittleness first, especially in open-ended chat.
- –Practical fixes are usually boring but effective: raise temperature modestly, keep `top_p` reasonable, use `presence_penalty` or `frequency_penalty`, cap output length, add stop sequences, and stream so you can cut off runaways early.
// TAGS
qwen3-5-smallllama-3.2gemma-3llminferenceopen-weightsreasoning
DISCOVERED
23d ago
2026-03-20
PUBLISHED
23d ago
2026-03-20
RELEVANCE
8/ 10
AUTHOR
lionellee77