Qwen3.6-35B-A3B hits 187 t/s on RTX 5090
Alibaba's latest sparse MoE model, Qwen3.6-35B-A3B, demonstrates extreme efficiency with 187 t/s on consumer hardware. Combining 35B total parameters with only 3B active, it targets local agentic coding and reasoning.
The 32GB VRAM of the RTX 5090 is the new frontier for high-performance local LLMs.
- –Sparse MoE architecture (35B/3B) provides a ~30% speedup over previous generations
- –120K context window at Q5 quantization fits comfortably in 32GB VRAM
- –Strong SWE-bench performance (73.4) indicates high suitability for local "agentic" developer workflows
- –Open-source Apache 2.0 license maintains Qwen's lead in the open-weights ecosystem
- –Hybrid linear attention reduces memory pressure for long-context tasks
DISCOVERED
45d ago
2026-04-17
PUBLISHED
45d ago
2026-04-16
RELEVANCE
AUTHOR
sammyranks