Qwen3.6-35B-A3B hits 187 t/s on RTX 5090

// 45d agoMODEL RELEASE

Qwen3.6-35B-A3B hits 187 t/s on RTX 5090

Alibaba's latest sparse MoE model, Qwen3.6-35B-A3B, demonstrates extreme efficiency with 187 t/s on consumer hardware. Combining 35B total parameters with only 3B active, it targets local agentic coding and reasoning.

// ANALYSIS

The 32GB VRAM of the RTX 5090 is the new frontier for high-performance local LLMs.

–Sparse MoE architecture (35B/3B) provides a ~30% speedup over previous generations
–120K context window at Q5 quantization fits comfortably in 32GB VRAM
–Strong SWE-bench performance (73.4) indicates high suitability for local "agentic" developer workflows
–Open-source Apache 2.0 license maintains Qwen's lead in the open-weights ecosystem
–Hybrid linear attention reduces memory pressure for long-context tasks

// TAGS

qwen3-6-35b-a3bllmopen-weightsmoeai-codingbenchmarkgpu

DISCOVERED

45d ago

2026-04-17

PUBLISHED

45d ago

2026-04-16

RELEVANCE

9/ 10

AUTHOR

sammyranks

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE20m ago

Executor Announces Self-Hosted Cloud Version

Rhys Sullivan has announced the imminent release of a self-hosted cloud version of Executor, a local-first, sandboxed execution runtime designed as an integration and control plane for AI agents. Sullivan shared that prior architectural efforts to keep Executor's core database-agnostic and implement pluggable database adapters—while initially challenging—are now paying dividends, facilitating the rollout of the new self-hosted cloud platform.

OPEN SOURCE38m ago

OpenClaw, NVIDIA Release AI Agent Security Dataset

Vincent Koc, Chief Architect of the OpenClaw Foundation, has announced a collaboration with NVIDIA to release the largest security dataset focused on AI agent skills. Built on the OpenClaw platform, this dataset provides a robust vulnerability audit benchmark to address supply chain risks in local-first AI ecosystems.

NEWS44m ago

Nous Research optimizes Hermes Agent for RTX Spark

Nous Research has collaborated with NVIDIA to run its open-source Hermes Agent on the newly announced RTX Spark superchip. The integration uses the new OpenShell security runtime to enable kernel-level safety boundaries directly on local hardware.