YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Qwen3.5 benchmarks hit 400k on 4090

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Qwen3.5 benchmarks hit 400k on 4090
OPEN LINK ↗
// 81d agoBENCHMARK RESULT

Qwen3.5 benchmarks hit 400k on 4090

A LocalLLaMA user benchmarked multiple Qwen3.5 variants from 2K to 400K context on an RTX 4090 and published the results on Reddit plus GitHub. The smaller models reached the full 400K test window, while larger 27B and 35B variants ran well into six-figure context lengths before hitting memory or stability limits.

// ANALYSIS

Community benchmarks like this are more useful than flashy context-window claims because they show what actually works on prosumer hardware.

  • Qwen3.5 0.8B, 2B, 4B, and 9B variants completed 400K-context runs on a single 4090, though cold-start latency became very high at the top end
  • The standout practical result is the quantized 9B model, which still delivered roughly 50 tokens/sec at 400K context in the published table
  • Bigger 27B and 35B variants looked strong up to around 196K context, then started failing or getting skipped beyond 262K because of OOM or server-busy issues
  • The repo’s warm-cache numbers matter: once KV cache is loaded, repeat interactions look much more usable than the scary first-token times suggest
// TAGS
qwen3-5llmbenchmarkinferencegpuopen-weights

DISCOVERED

81d ago

2026-03-07

PUBLISHED

81d ago

2026-03-07

RELEVANCE

8/ 10

AUTHOR

AlwaysTiredButItsOk