OPEN_SOURCE ↗
REDDIT · REDDIT// 5h agoBENCHMARK RESULT
Qwen3.6-27B falters near 90K context
A LocalLLaMA user reports that Qwen3.6-27B stays sharp below about 64K tokens on an RX 7900 XTX, then starts failing tool calls around 90K in a 128K setup. The model card advertises a 262K default context and recommends keeping at least 128K for preserving thinking, so this is a reminder that long-context quality can decay before the nominal max.
// ANALYSIS
This looks less like a surprise bug than a familiar long-context cliff: the window exists, but agentic reliability still drops as prompts get crowded.
- –The failure mode matters: tool calling breaks before plain code generation does, which suggests structured output and planning are the first things to degrade.
- –Q4_K_XL plus q8_0 KV cache on AMD/llama.cpp is a real-world stress test, so the hardware/runtime stack may be amplifying the context cliff.
- –Qwen's own docs recommend tool-use parsers and newer serving stacks like vLLM or SGLang for this model, which hints the benchmark-friendly path is not always the best path for local agent loops.
- –For devops-style work at 90K+, retrieval, shorter working sets, or a stronger long-context serving setup will likely beat brute-forcing a larger nominal window.
// TAGS
qwen3.6-27bllminferencegpuagentclibenchmark
DISCOVERED
5h ago
2026-04-30
PUBLISHED
7h ago
2026-04-30
RELEVANCE
8/ 10
AUTHOR
dodistyo