Qwen3.6-35B-A3B hits 130 t/s on consumer GPUs

// 45d agoMODEL RELEASE

Qwen3.6-35B-A3B hits 130 t/s on consumer GPUs

Alibaba's sparse MoE model, Qwen3.6-35B-A3B, is seeing rapid local adoption as developers optimize it for consumer hardware, reaching inference speeds up to 130 tokens per second on the RTX 3090. The model's efficiency and high coding performance are setting a new standard for open-weight models.

// ANALYSIS

Qwen3.6-35B-A3B's MoE architecture is a major milestone for local AI, providing elite coding capability at speeds previously reserved for much smaller models. The sparse MoE with only 3B active parameters per token enables lightning-fast inference while maintaining 35B-class reasoning. Currently, IQ4 quantization offers the optimal tradeoff between speed and reasoning accuracy for local hardware, with performance gains of 10-15 t/s possible using specialized coding presets from Unsloth. Its 262K native context window and 73.4% SWE-bench score position it as a formidable local competitor to cloud models.

// TAGS

qwen3.6-35b-a3bllmai-codingopen-weightsopen-sourcebenchmark

DISCOVERED

45d ago

2026-04-19

PUBLISHED

45d ago

2026-04-19

RELEVANCE

10/ 10

AUTHOR

cviperr33

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE20m ago

MemGraphRAG resolves GraphRAG database inconsistencies

MemGraphRAG is an open-source, memory-based multi-agent framework designed to improve Graph Retrieval-Augmented Generation (GraphRAG) performance. By combining collaborative agents with a three-layer global memory system, it dynamically resolves database conflicts and bridges disconnected knowledge paths to construct more robust graphs.

TUTORIAL30m ago

Hermes Agent Desktop gets remote backend guide

Nous Research has published an updated guide for connecting the Hermes Agent Desktop client to a remote backend, such as a VPS or home server. The tutorial walks developers through configuring persistent session tokens, verifying connectivity, and using the automatic remote-to-host file synchronization system.

RESEARCH37m ago

DeepMind warns model-level AI governance is insufficient

A new position paper from Google DeepMind and the Centre for the Governance of AI argues that modern AI governance frameworks focusing primarily on base models fail to account for "non-model gains." These gains include inference-time compute scaling, system scaffolds (e.g., agents, external tools), and restricted asset integration, all of which enhance a model's capabilities post-deployment. The authors propose shifting towards broader layers of governance—such as system, entity, agent, and cloud-level controls—complemented by efforts to build overall societal resilience.