Qwen3.5 122B stumbles at 100K

// 114d agoNEWS

Qwen3.5 122B stumbles at 100K

A Reddit user reports Qwen3.5-122B-A10B losing instruction-following around the 100K-token mark when served in vLLM with an olka-fi MXFP4 quant. That’s notable because Qwen’s official docs advertise 262,144-token native context, so the failure looks more like a serving or quantization edge case than a hard model limit.

// ANALYSIS

Hot take: this smells like a runtime or quantization problem, not the base model suddenly running out of context headroom.

–The official model card says Qwen3.5-122B-A10B supports 262,144 native tokens and can be stretched further with RoPE scaling, so 100K should still be inside its design envelope.
–The olka-fi MXFP4 pack is a third-party quant; its own card shows conservative vLLM guidance and only quantizes the expert MLP weights, so calibration or inference behavior is the likely weak point.
–The Reddit thread already has contradictory reports, including users saying NVFP4 or other setups do not reproduce the collapse, which points to stack-specific behavior.
–For anyone evaluating Qwen3.5 locally, this is a good reminder to test the exact model, quant, and serving engine combination, not just the base checkpoint.

// TAGS

qwen3.5-122b-a10bllminferenceagentbenchmarkopen-source

DISCOVERED

114d ago

2026-03-19

PUBLISHED

114d ago

2026-03-19

RELEVANCE

8/ 10

AUTHOR

TokenRingAI

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE50m ago

Lightpanda merges IndexedDB support for automation

Lightpanda, the open-source headless browser engine written in Zig for web automation and AI agents, has added base implementation support for IndexedDB to its main branch. This update allows scripts that depend on IndexedDB for client-side storage to execute successfully, removing a significant barrier for automation and scraping workflows on modern web applications.

OPEN SOURCE58m ago

LangChain-Chatchat builds local private RAG pipelines

LangChain-Chatchat is an open-source, local knowledge-based QA application and RAG framework built on LangChain, FastAPI, and Streamlit. It provides a private, offline pipeline that integrates with Ollama and Xinference to support open-source models like Llama3 and Qwen2.

OPEN SOURCE1h ago

prose stylesheet forces clean AI writing

prose is a lightweight, single-file Markdown prompt configuration that guides AI coding agents to communicate like a direct, confident senior engineer. Appended directly to local agent instruction files, it establishes clear rules to eliminate common AI patterns like cheesy setups, over-bulleted reasoning, and theatrical language.