BACK_TO_FEEDAICRIER_2
Qwen3.6-35B-A3B hits 187 t/s on RTX 5090
OPEN_SOURCE ↗
REDDIT · REDDIT// 3h agoMODEL RELEASE

Qwen3.6-35B-A3B hits 187 t/s on RTX 5090

Alibaba's latest sparse MoE model, Qwen3.6-35B-A3B, demonstrates extreme efficiency with 187 t/s on consumer hardware. Combining 35B total parameters with only 3B active, it targets local agentic coding and reasoning.

// ANALYSIS

The 32GB VRAM of the RTX 5090 is the new frontier for high-performance local LLMs.

  • Sparse MoE architecture (35B/3B) provides a ~30% speedup over previous generations
  • 120K context window at Q5 quantization fits comfortably in 32GB VRAM
  • Strong SWE-bench performance (73.4) indicates high suitability for local "agentic" developer workflows
  • Open-source Apache 2.0 license maintains Qwen's lead in the open-weights ecosystem
  • Hybrid linear attention reduces memory pressure for long-context tasks
// TAGS
qwen3-6-35b-a3bllmopen-weightsmoeai-codingbenchmarkgpu

DISCOVERED

3h ago

2026-04-17

PUBLISHED

4h ago

2026-04-16

RELEVANCE

9/ 10

AUTHOR

sammyranks