Qwen 3.5 hybrid attention triggers long-context hallucinations
Developers are reporting severe hallucination issues with Qwen 3.5's 27B and 35B models at extended context lengths. Community discussion points to the models' new hybrid linear/global attention architecture as the likely culprit, with users finding that prompt engineering fails to mitigate the degradation.
The transition to hybrid attention for inference efficiency is exposing a painful trade-off between raw context length and generation stability.
- –While Qwen 3.5's mix of linear and full attention enables massive 256k windows on paper, users report the model struggles to maintain coherence in complex, real-world prompts
- –The issue highlights a broader industry challenge where theoretical "needle in a haystack" benchmark success doesn't always translate to reliable agentic workflows
- –Developers are currently finding that standard mitigation techniques are ineffective against these underlying architectural quirks
- –This may drive users prioritizing factual integrity in long documents back to computationally heavier, full-attention models
DISCOVERED
56d ago
2026-04-01
PUBLISHED
56d ago
2026-04-01
RELEVANCE
AUTHOR
appakaradi