BACK_TO_FEEDAICRIER_2
Qwen3.6-35B-A3B hits 130 t/s on consumer GPUs
OPEN_SOURCE ↗
REDDIT · REDDIT// 7h agoMODEL RELEASE

Qwen3.6-35B-A3B hits 130 t/s on consumer GPUs

Alibaba's sparse MoE model, Qwen3.6-35B-A3B, is seeing rapid local adoption as developers optimize it for consumer hardware, reaching inference speeds up to 130 tokens per second on the RTX 3090. The model's efficiency and high coding performance are setting a new standard for open-weight models.

// ANALYSIS

Qwen3.6-35B-A3B's MoE architecture is a major milestone for local AI, providing elite coding capability at speeds previously reserved for much smaller models. The sparse MoE with only 3B active parameters per token enables lightning-fast inference while maintaining 35B-class reasoning. Currently, IQ4 quantization offers the optimal tradeoff between speed and reasoning accuracy for local hardware, with performance gains of 10-15 t/s possible using specialized coding presets from Unsloth. Its 262K native context window and 73.4% SWE-bench score position it as a formidable local competitor to cloud models.

// TAGS
qwen3.6-35b-a3bllmai-codingopen-weightsopen-sourcebenchmark

DISCOVERED

7h ago

2026-04-19

PUBLISHED

8h ago

2026-04-19

RELEVANCE

10/ 10

AUTHOR

cviperr33