BACK_TO_FEEDAICRIER_2
Qwen 3.6-35B-A3B hits 140 t/s on RTX 4090
OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoMODEL RELEASE

Qwen 3.6-35B-A3B hits 140 t/s on RTX 4090

A user reports impressive local performance for Alibaba's latest MoE coding model, achieving 140 tokens/sec on an RTX 4090. The sparse architecture balances 35B reasoning depth with 3B-class inference speed, optimized for agentic coding and multimodal reasoning.

// ANALYSIS

Qwen 3.6-35B-A3B is a category-defining "agentic first" open model that brings state-of-the-art coding performance to consumer hardware.

  • The Mixture-of-Experts (MoE) design uses only 3B active parameters, allowing it to run at high speed even with high-precision Q8 quantization.
  • Native 262k context window and "thinking preservation" feature reduce redundant computation in long-running agentic tasks.
  • It excels at repository-level reasoning and tool calling, directly challenging proprietary models like Claude 3.5 Sonnet for local workflows.
  • Multimodal support allows the model to reason about UI/UX designs and diagrams alongside code.
// TAGS
qwen3.6-35b-a3bqwenllmai-codingagentopen-sourcemoe

DISCOVERED

4h ago

2026-04-18

PUBLISHED

7h ago

2026-04-17

RELEVANCE

10/ 10

AUTHOR

JuniorDeveloper73