Qwen3.5 benchmarks hit 400k on 4090
A LocalLLaMA user benchmarked multiple Qwen3.5 variants from 2K to 400K context on an RTX 4090 and published the results on Reddit plus GitHub. The smaller models reached the full 400K test window, while larger 27B and 35B variants ran well into six-figure context lengths before hitting memory or stability limits.
Community benchmarks like this are more useful than flashy context-window claims because they show what actually works on prosumer hardware.
- –Qwen3.5 0.8B, 2B, 4B, and 9B variants completed 400K-context runs on a single 4090, though cold-start latency became very high at the top end
- –The standout practical result is the quantized 9B model, which still delivered roughly 50 tokens/sec at 400K context in the published table
- –Bigger 27B and 35B variants looked strong up to around 196K context, then started failing or getting skipped beyond 262K because of OOM or server-busy issues
- –The repo’s warm-cache numbers matter: once KV cache is loaded, repeat interactions look much more usable than the scary first-token times suggest
DISCOVERED
81d ago
2026-03-07
PUBLISHED
81d ago
2026-03-07
RELEVANCE
AUTHOR
AlwaysTiredButItsOk