Qwen3.5 benchmarks hit 400k on 4090

// 81d agoBENCHMARK RESULT

Qwen3.5 benchmarks hit 400k on 4090

A LocalLLaMA user benchmarked multiple Qwen3.5 variants from 2K to 400K context on an RTX 4090 and published the results on Reddit plus GitHub. The smaller models reached the full 400K test window, while larger 27B and 35B variants ran well into six-figure context lengths before hitting memory or stability limits.

// ANALYSIS

Community benchmarks like this are more useful than flashy context-window claims because they show what actually works on prosumer hardware.

–Qwen3.5 0.8B, 2B, 4B, and 9B variants completed 400K-context runs on a single 4090, though cold-start latency became very high at the top end
–The standout practical result is the quantized 9B model, which still delivered roughly 50 tokens/sec at 400K context in the published table
–Bigger 27B and 35B variants looked strong up to around 196K context, then started failing or getting skipped beyond 262K because of OOM or server-busy issues
–The repo’s warm-cache numbers matter: once KV cache is loaded, repeat interactions look much more usable than the scary first-token times suggest

// TAGS

qwen3-5llmbenchmarkinferencegpuopen-weights

DISCOVERED

81d ago

2026-03-07

PUBLISHED

81d ago

2026-03-07

RELEVANCE

8/ 10

AUTHOR

AlwaysTiredButItsOk

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE6h ago

Cursor adds dedicated subagents for skills

Cursor now allows developers to execute tool-heavy or research-intensive agent skills within dedicated subagents. This architectural shift isolates noisy background tasks, keeping the main chat context clean and focused.

UPDATE7h ago

YouTube moves AI labels to video player

YouTube is moving its AI content disclosures from video descriptions to more prominent placements beneath the player and on Shorts overlays. Starting in May, the platform will use internal signals to automatically label photorealistic AI content that creators fail to disclose.

OPEN SOURCE10h ago

Taste Skill kills AI "frontend slop"

Taste-Skill is an open-source framework that provides portable "agent skills" to enforce high-end design principles in AI-generated code. By injecting specific design directives and "anti-slop" rules, it enables LLMs to produce editorial-grade UIs that bypass generic, boilerplate-heavy AI templates.