BACK_TO_FEEDAICRIER_2
Qwen3.5 models spit gibberish on long prompts
OPEN_SOURCE ↗
REDDIT · REDDIT// 19d agoMODEL RELEASE

Qwen3.5 models spit gibberish on long prompts

Reddit users say Qwen3.5 4B/9B/27B/122B models in LM Studio start turning 50K+ token prompts into non-grammatical word salad, and the breakdown shows up even in the model's thinking trace. The same input reportedly stays coherent on GPT-OSS-120B, which makes this look more like a long-context serving or cache-handling issue than a pure prompt problem.

// ANALYSIS

Hot take: this smells like a runtime/config bug first and a model-quality problem second. Qwen3.5's own docs advertise a 262K-token default context and recommend keeping at least 128K for thinking, so collapsing around 50K is well below the published envelope.

  • Local stacks like LM Studio, GGUF builds, and llama.cpp derivatives can diverge from upstream serving recipes on chat templates, reasoning parsers, and context management.
  • The fact that the gibberish starts in the thinking trace points toward cache/state corruption or context-window handling, not just a bad sampling preset.
  • GPT-OSS-120B handling the same prompt cleanly suggests the input itself is not inherently pathological.
  • If others can reproduce this on official Qwen presets, it deserves a backend bug report with exact quantization, context length, and template settings.
// TAGS
qwen3-5llmreasoningopen-weightsinferenceself-hosted

DISCOVERED

19d ago

2026-03-24

PUBLISHED

19d ago

2026-03-24

RELEVANCE

9/ 10

AUTHOR

custodiam99