Qwen3.5 Q3 Hits Long-Context Wall

// 45d agoNEWS

Qwen3.5 Q3 Hits Long-Context Wall

A LocalLLaMA user reports Qwen3.5-122B-A10B in Q3_K_XL stays strong for coding until roughly 75-80K tokens, then degrades abruptly with hallucinations and confusion. The model itself supports 262K native context, so this looks more like a quantization-and-serving stability issue than a hard context-limit problem.

// ANALYSIS

This reads like a real long-context cliff, not just normal “more tokens, slightly worse answers” drift. The model is still well below its advertised context ceiling, which points the finger at low-bit weights, prompt accumulation, and session management rather than raw window size alone.

–Qwen3.5-122B-A10B is a MoE model with 262,144 native context and official guidance to keep at least 128K for preserving thinking quality, so 75-80K should not be inherently dangerous
–The abrupt failure pattern is consistent with quantization stress under long-context retrieval, and the thread’s replies echo that lower quants can diverge from higher-precision runs over long sessions
–BF16 KV cache helps memory fidelity, but it does not fix weight-quantization loss in attention, routing, and token selection
–The current sampling stack is fairly sharp already; more aggressive penalties can make a model feel more erratic once context quality starts slipping
–The practical fix is the one the poster already found: compact early, keep a running summary, and if possible move to a sturdier quant or a denser model for very long coding sessions

// TAGS

llmopen-weightslong-contextquantizationmoeai-codingqwen3.5-122b-a10b

DISCOVERED

45d ago

2026-05-26

PUBLISHED

45d ago

2026-05-26

RELEVANCE

8/ 10

AUTHOR

_TheWolfOfWalmart_

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE31m ago

rabbitOS 2.3 integrates Nous Hermes Agent

In the latest rabbitOS 2.3 OTA update, Rabbit Inc. has added native integration for Nous Research's autonomous Hermes Agent on the Rabbit R1. Users link their local Hermes Agent terminal via the Rabbithole web portal and swipe left on the R1 home screen to interact with the agent.

OPEN SOURCE56m ago

Colibrì streams 744B GLM-5.2 from disk

Colibrì is a zero-dependency, pure-C inference engine that streams GLM-5.2 parameters from disk on demand, enabling standard PCs to run the 744B model. By keeping the dense model parts resident in RAM and streaming the massive routed experts from an NVMe SSD, it bypasses the need for high-end GPUs or massive RAM configurations.

MODEL1h ago

OpenAI GPT-5.6 boosts health intelligence

OpenAI has introduced the GPT-5.6 model family—comprising the Sol, Terra, and Luna tiers—with a strong focus on health intelligence and clinical safety. Evaluated on HealthBench, the highly cost-efficient Luna model aims to enable continuous health monitoring and large-scale medical applications.