OPEN_SOURCE ↗
REDDIT · REDDIT// 19d agoMODEL RELEASE
Qwen3.5 models spit gibberish on long prompts
Reddit users say Qwen3.5 4B/9B/27B/122B models in LM Studio start turning 50K+ token prompts into non-grammatical word salad, and the breakdown shows up even in the model's thinking trace. The same input reportedly stays coherent on GPT-OSS-120B, which makes this look more like a long-context serving or cache-handling issue than a pure prompt problem.
// ANALYSIS
Hot take: this smells like a runtime/config bug first and a model-quality problem second. Qwen3.5's own docs advertise a 262K-token default context and recommend keeping at least 128K for thinking, so collapsing around 50K is well below the published envelope.
- –Local stacks like LM Studio, GGUF builds, and llama.cpp derivatives can diverge from upstream serving recipes on chat templates, reasoning parsers, and context management.
- –The fact that the gibberish starts in the thinking trace points toward cache/state corruption or context-window handling, not just a bad sampling preset.
- –GPT-OSS-120B handling the same prompt cleanly suggests the input itself is not inherently pathological.
- –If others can reproduce this on official Qwen presets, it deserves a backend bug report with exact quantization, context length, and template settings.
// TAGS
qwen3-5llmreasoningopen-weightsinferenceself-hosted
DISCOVERED
19d ago
2026-03-24
PUBLISHED
19d ago
2026-03-24
RELEVANCE
9/ 10
AUTHOR
custodiam99