BACK_TO_FEEDAICRIER_2
Qwen3.5 122B stumbles at 100K
OPEN_SOURCE ↗
REDDIT · REDDIT// 23d agoNEWS

Qwen3.5 122B stumbles at 100K

A Reddit user reports Qwen3.5-122B-A10B losing instruction-following around the 100K-token mark when served in vLLM with an olka-fi MXFP4 quant. That’s notable because Qwen’s official docs advertise 262,144-token native context, so the failure looks more like a serving or quantization edge case than a hard model limit.

// ANALYSIS

Hot take: this smells like a runtime or quantization problem, not the base model suddenly running out of context headroom.

  • The official model card says Qwen3.5-122B-A10B supports 262,144 native tokens and can be stretched further with RoPE scaling, so 100K should still be inside its design envelope.
  • The olka-fi MXFP4 pack is a third-party quant; its own card shows conservative vLLM guidance and only quantizes the expert MLP weights, so calibration or inference behavior is the likely weak point.
  • The Reddit thread already has contradictory reports, including users saying NVFP4 or other setups do not reproduce the collapse, which points to stack-specific behavior.
  • For anyone evaluating Qwen3.5 locally, this is a good reminder to test the exact model, quant, and serving engine combination, not just the base checkpoint.
// TAGS
qwen3.5-122b-a10bllminferenceagentbenchmarkopen-source

DISCOVERED

23d ago

2026-03-19

PUBLISHED

23d ago

2026-03-19

RELEVANCE

8/ 10

AUTHOR

TokenRingAI