OPEN_SOURCE ↗
REDDIT · REDDIT// 23d agoNEWS
Qwen3.5 122B stumbles at 100K
A Reddit user reports Qwen3.5-122B-A10B losing instruction-following around the 100K-token mark when served in vLLM with an olka-fi MXFP4 quant. That’s notable because Qwen’s official docs advertise 262,144-token native context, so the failure looks more like a serving or quantization edge case than a hard model limit.
// ANALYSIS
Hot take: this smells like a runtime or quantization problem, not the base model suddenly running out of context headroom.
- –The official model card says Qwen3.5-122B-A10B supports 262,144 native tokens and can be stretched further with RoPE scaling, so 100K should still be inside its design envelope.
- –The olka-fi MXFP4 pack is a third-party quant; its own card shows conservative vLLM guidance and only quantizes the expert MLP weights, so calibration or inference behavior is the likely weak point.
- –The Reddit thread already has contradictory reports, including users saying NVFP4 or other setups do not reproduce the collapse, which points to stack-specific behavior.
- –For anyone evaluating Qwen3.5 locally, this is a good reminder to test the exact model, quant, and serving engine combination, not just the base checkpoint.
// TAGS
qwen3.5-122b-a10bllminferenceagentbenchmarkopen-source
DISCOVERED
23d ago
2026-03-19
PUBLISHED
23d ago
2026-03-19
RELEVANCE
8/ 10
AUTHOR
TokenRingAI