Qwen3.5-0.8B stumbles in long-think mode
A Reddit post shows Qwen3.5-0.8B taking 1609.4 seconds on “1+1” in Ollama, sparking a config-vs-capability debate. Community replies point to likely misconfiguration, and the official model card explicitly notes that 0.8B is default non-thinking and can enter thinking loops if settings are off.
This looks less like a “Qwen is broken” moment and more like a classic tiny-model + wrong inference settings failure mode.
- –The thread itself highlights missing generation context (tokens, sampling, template), which makes the result hard to interpret as a fair model test.
- –Qwen’s official Hugging Face docs warn Qwen3.5-0.8B can get stuck in thinking loops and may fail to terminate under some sampling setups.
- –Qwen3.5-0.8B is intended for lightweight prototyping, not robust long-chain reasoning under aggressive think settings.
- –For local runs, template correctness, thinking-mode controls, and stop/stream safeguards matter as much as raw model quality.
DISCOVERED
71d ago
2026-03-17
PUBLISHED
71d ago
2026-03-17
RELEVANCE
AUTHOR
doggo_legend