YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Qwen3.6-35B-A3B hits 30 t/s at 128K

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Qwen3.6-35B-A3B hits 30 t/s at 128K
OPEN LINK ↗
// 47d agoBENCHMARK RESULT

Qwen3.6-35B-A3B hits 30 t/s at 128K

On an RTX 5080 16GB, this local llama.cpp setup keeps Qwen3.6-35B-A3B usable at 128K context, reaching about 30 t/s with a hybrid KV and expert-offload configuration. The post argues that for coding agents, context depth and memory placement matter more than raw d=0 speed or denser quantization.

// ANALYSIS

The punchline is that long-context local coding is no longer a “maybe someday” story on a single consumer GPU, but a tuning problem with sharp cliffs. Once the author found the right MoE offload balance and KV layout, Qwen3.6-35B-A3B became fast enough to keep a Claude Code-style workflow practical.

  • The key benchmark is the depth curve: the model stays strong as context grows, instead of collapsing the way the dense 27B setup did.
  • The post makes a convincing case that file size and offload balance beat “better” quant labels if they let more experts stay on GPU.
  • The eval lesson matters: self-written tests overstated quality differences until the author switched to one shared harness and deterministic sampling.
  • The hybrid KV finding is the most interesting systems result here: compressing everything was slower than selectively promoting hot layers to q8_0 at long context.
  • For local agent users, the real threshold is not peak tokens/sec at empty context; it is whether the model stays responsive at the conversation depth you actually work in.
// TAGS
qwen3-6-35b-a3bllmai-codingagentinferencegpuopen-source

DISCOVERED

47d ago

2026-05-01

PUBLISHED

47d ago

2026-04-30

RELEVANCE

9/ 10

AUTHOR

craftogrammer