OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoMODEL RELEASE
Qwen 3.6-35B-A3B hits 140 t/s on RTX 4090
A user reports impressive local performance for Alibaba's latest MoE coding model, achieving 140 tokens/sec on an RTX 4090. The sparse architecture balances 35B reasoning depth with 3B-class inference speed, optimized for agentic coding and multimodal reasoning.
// ANALYSIS
Qwen 3.6-35B-A3B is a category-defining "agentic first" open model that brings state-of-the-art coding performance to consumer hardware.
- –The Mixture-of-Experts (MoE) design uses only 3B active parameters, allowing it to run at high speed even with high-precision Q8 quantization.
- –Native 262k context window and "thinking preservation" feature reduce redundant computation in long-running agentic tasks.
- –It excels at repository-level reasoning and tool calling, directly challenging proprietary models like Claude 3.5 Sonnet for local workflows.
- –Multimodal support allows the model to reason about UI/UX designs and diagrams alongside code.
// TAGS
qwen3.6-35b-a3bqwenllmai-codingagentopen-sourcemoe
DISCOVERED
4h ago
2026-04-18
PUBLISHED
7h ago
2026-04-17
RELEVANCE
10/ 10
AUTHOR
JuniorDeveloper73