OPEN_SOURCE ↗
REDDIT · REDDIT// 3h agoMODEL RELEASE
Qwen3.6-35B-A3B hits 187 t/s on RTX 5090
Alibaba's latest sparse MoE model, Qwen3.6-35B-A3B, demonstrates extreme efficiency with 187 t/s on consumer hardware. Combining 35B total parameters with only 3B active, it targets local agentic coding and reasoning.
// ANALYSIS
The 32GB VRAM of the RTX 5090 is the new frontier for high-performance local LLMs.
- –Sparse MoE architecture (35B/3B) provides a ~30% speedup over previous generations
- –120K context window at Q5 quantization fits comfortably in 32GB VRAM
- –Strong SWE-bench performance (73.4) indicates high suitability for local "agentic" developer workflows
- –Open-source Apache 2.0 license maintains Qwen's lead in the open-weights ecosystem
- –Hybrid linear attention reduces memory pressure for long-context tasks
// TAGS
qwen3-6-35b-a3bllmopen-weightsmoeai-codingbenchmarkgpu
DISCOVERED
3h ago
2026-04-17
PUBLISHED
4h ago
2026-04-16
RELEVANCE
9/ 10
AUTHOR
sammyranks