YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Qwen3.6-35B-A3B hits 187 t/s on RTX 5090

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Qwen3.6-35B-A3B hits 187 t/s on RTX 5090
OPEN LINK ↗
// 45d agoMODEL RELEASE

Qwen3.6-35B-A3B hits 187 t/s on RTX 5090

Alibaba's latest sparse MoE model, Qwen3.6-35B-A3B, demonstrates extreme efficiency with 187 t/s on consumer hardware. Combining 35B total parameters with only 3B active, it targets local agentic coding and reasoning.

// ANALYSIS

The 32GB VRAM of the RTX 5090 is the new frontier for high-performance local LLMs.

  • Sparse MoE architecture (35B/3B) provides a ~30% speedup over previous generations
  • 120K context window at Q5 quantization fits comfortably in 32GB VRAM
  • Strong SWE-bench performance (73.4) indicates high suitability for local "agentic" developer workflows
  • Open-source Apache 2.0 license maintains Qwen's lead in the open-weights ecosystem
  • Hybrid linear attention reduces memory pressure for long-context tasks
// TAGS
qwen3-6-35b-a3bllmopen-weightsmoeai-codingbenchmarkgpu

DISCOVERED

45d ago

2026-04-17

PUBLISHED

45d ago

2026-04-16

RELEVANCE

9/ 10

AUTHOR

sammyranks