OPEN_SOURCE ↗
REDDIT · REDDIT// 23d agoMODEL RELEASE
Qwen3.5 4B overthinks simple hello
Qwen3.5 4B on Ollama is doing something that looks bizarre on first run: a simple “hello” triggers a long reasoning dump before the final reply. That’s likely a thinking-mode or prompt-template effect, not the model genuinely getting “stuck” on a greeting.
// ANALYSIS
The weird part here isn’t the answer, it’s that the local stack is exposing the model’s scratchpad for a throwaway prompt. That’s normal for some Qwen/Ollama setups, but it’s still a UX trap if you expected a clean, chatty one-liner.
- –Ollama’s Qwen3.5 page explicitly tags the family with `thinking`, so verbose reasoning output is part of the intended behavior on thinking-capable variants.
- –For a 4B local model, decoding defaults matter a lot; if you leave the model too unconstrained, it can ramble even on trivial prompts.
- –If you want terse chat, use a non-thinking/instruct variant or tighten your generation settings instead of assuming the model is broken.
- –This is a good reminder that local LLMs often fail first at presentation, not capability: the model may be fine, but the wrapper is too permissive.
// TAGS
qwen3-5-4bollamallmreasoningchatbotself-hostedopen-source
DISCOVERED
23d ago
2026-03-19
PUBLISHED
23d ago
2026-03-19
RELEVANCE
8/ 10
AUTHOR
Snoo_what