Qwen3.6-35B-A3B crowns 12GB sweet spot

// 4h agoBENCHMARK RESULT

Qwen3.6-35B-A3B crowns 12GB sweet spot

A Reddit benchmark on an RTX 3060 12GB shows Qwen3.6-35B-A3B is surprisingly practical locally, especially with tuned `-ncmoe` and q8 KV cache. The author reports ~46-47 tok/s decode and says 32k context is usable without falling off the VRAM cliff.

// ANALYSIS

The interesting part is not the raw speed, it’s that a 35B MoE model crosses from “theoretical” to “daily-driver” territory on 12GB if you tune offload carefully.

–The sweet spot appears to be `-ncmoe 18-20`; pushing to `16` triggers a sharp performance cliff
–q8 KV cache is effectively free here, so the usual memory-speed tradeoff leans toward higher-cache precision
–Plain decoding already lands around 46 tok/s, which makes MTP only a marginal upgrade in this setup
–The practical win is context: 16k to 32k feels achievable instead of being a benchmark-only configuration
–For local coding, this is a better signal than headline benchmark charts because it reflects the real constraint: VRAM, not just FLOPS

// TAGS

llmmoequantizationinferencebenchmarkqwen3-6-35b-a3b

DISCOVERED

4h ago

2026-05-09

PUBLISHED

7h ago

2026-05-08

RELEVANCE

9/ 10

AUTHOR

jwestra

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

BENCHMARK22m ago

Hermes Agent tops OpenRouter rankings

OpenRouter's app leaderboard now puts Hermes Agent at #1, spotlighting Nous Research's open-source, persistent AI agent. The signal matters because it reflects real usage at scale, not just launch-day hype.

BENCHMARK54m ago

Qwen3-Coder-Next impresses local model users

This Reddit post is a local-inference comparison, not a formal launch writeup: the author says Qwen3-Coder-Next on MLX feels faster than their previous quickest model and produces better output than several much larger local models. The takeaway is that it may be a strong sweet spot for Apple Silicon users who want serious coding capability without paying the latency tax of giant checkpoints.

OPEN SOURCE1h ago

DeepSeek-TUI sharpens terminal coding flows

DeepSeek TUI is an open-source terminal coding agent for DeepSeek V4 that can read and edit files, run shell commands, search the web, manage git, and coordinate sub-agents. The latest release, v0.8.22, landed on May 8, 2026 and adds polish around locale handling, session behavior, Docker distribution, and install reliability.

Qwen3.6-35B-A3B crowns 12GB sweet spot