OPEN_SOURCE ↗
REDDIT · REDDIT// 36d agoBENCHMARK RESULT
Qwen3.5 benchmarks hit 400k on 4090
A LocalLLaMA user benchmarked multiple Qwen3.5 variants from 2K to 400K context on an RTX 4090 and published the results on Reddit plus GitHub. The smaller models reached the full 400K test window, while larger 27B and 35B variants ran well into six-figure context lengths before hitting memory or stability limits.
// ANALYSIS
Community benchmarks like this are more useful than flashy context-window claims because they show what actually works on prosumer hardware.
- –Qwen3.5 0.8B, 2B, 4B, and 9B variants completed 400K-context runs on a single 4090, though cold-start latency became very high at the top end
- –The standout practical result is the quantized 9B model, which still delivered roughly 50 tokens/sec at 400K context in the published table
- –Bigger 27B and 35B variants looked strong up to around 196K context, then started failing or getting skipped beyond 262K because of OOM or server-busy issues
- –The repo’s warm-cache numbers matter: once KV cache is loaded, repeat interactions look much more usable than the scary first-token times suggest
// TAGS
qwen3-5llmbenchmarkinferencegpuopen-weights
DISCOVERED
36d ago
2026-03-07
PUBLISHED
36d ago
2026-03-07
RELEVANCE
8/ 10
AUTHOR
AlwaysTiredButItsOk