BACK_TO_FEEDAICRIER_2
Qwen3.5 benchmarks hit 400k on 4090
OPEN_SOURCE ↗
REDDIT · REDDIT// 36d agoBENCHMARK RESULT

Qwen3.5 benchmarks hit 400k on 4090

A LocalLLaMA user benchmarked multiple Qwen3.5 variants from 2K to 400K context on an RTX 4090 and published the results on Reddit plus GitHub. The smaller models reached the full 400K test window, while larger 27B and 35B variants ran well into six-figure context lengths before hitting memory or stability limits.

// ANALYSIS

Community benchmarks like this are more useful than flashy context-window claims because they show what actually works on prosumer hardware.

  • Qwen3.5 0.8B, 2B, 4B, and 9B variants completed 400K-context runs on a single 4090, though cold-start latency became very high at the top end
  • The standout practical result is the quantized 9B model, which still delivered roughly 50 tokens/sec at 400K context in the published table
  • Bigger 27B and 35B variants looked strong up to around 196K context, then started failing or getting skipped beyond 262K because of OOM or server-busy issues
  • The repo’s warm-cache numbers matter: once KV cache is loaded, repeat interactions look much more usable than the scary first-token times suggest
// TAGS
qwen3-5llmbenchmarkinferencegpuopen-weights

DISCOVERED

36d ago

2026-03-07

PUBLISHED

36d ago

2026-03-07

RELEVANCE

8/ 10

AUTHOR

AlwaysTiredButItsOk