OPEN_SOURCE ↗
REDDIT · REDDIT// 36d agoNEWS
Qwen 3.5 users push back on verbosity
A LocalLLaMA thread argues Qwen 3.5 often over-explains simple prompts and makes “thinking” hard to disable reliably, especially when compared with Gemini 2.5 Flash’s terse answers. The complaint is practical rather than academic: extra reasoning is less useful when it inflates latency and token cost for routine questions.
// ANALYSIS
This is really a UX complaint about model defaults, not just a taste issue about writing style.
- –The post frames Qwen 3.5 as capable but inefficient for everyday chat because its answers feel benchmark-shaped instead of user-shaped.
- –Qwen’s own model docs emphasize separate thinking and non-thinking modes, which makes the thread notable because it highlights how wrappers and serving setups can still produce verbose behavior in practice.
- –For AI developers, this is a reminder that inference UX now matters almost as much as raw model quality: concise answers, controllable reasoning, and predictable output length are product features.
- –The comparison to Gemini 2.5 Flash shows why “short by default, detailed on request” is becoming the preferred interaction pattern for fast consumer and developer assistants.
// TAGS
qwen-3.5llmreasoningopen-sourceprompt-engineering
DISCOVERED
36d ago
2026-03-06
PUBLISHED
36d ago
2026-03-06
RELEVANCE
6/ 10
AUTHOR
ashirviskas