Qwen 3.6-35B-A3B hits 140 t/s on RTX 4090

// 45d agoMODEL RELEASE

Qwen 3.6-35B-A3B hits 140 t/s on RTX 4090

A user reports impressive local performance for Alibaba's latest MoE coding model, achieving 140 tokens/sec on an RTX 4090. The sparse architecture balances 35B reasoning depth with 3B-class inference speed, optimized for agentic coding and multimodal reasoning.

// ANALYSIS

Qwen 3.6-35B-A3B is a category-defining "agentic first" open model that brings state-of-the-art coding performance to consumer hardware.

–The Mixture-of-Experts (MoE) design uses only 3B active parameters, allowing it to run at high speed even with high-precision Q8 quantization.
–Native 262k context window and "thinking preservation" feature reduce redundant computation in long-running agentic tasks.
–It excels at repository-level reasoning and tool calling, directly challenging proprietary models like Claude 3.5 Sonnet for local workflows.
–Multimodal support allows the model to reason about UI/UX designs and diagrams alongside code.

// TAGS

qwen3.6-35b-a3bqwenllmai-codingagentopen-sourcemoe

DISCOVERED

45d ago

2026-04-18

PUBLISHED

45d ago

2026-04-17

RELEVANCE

10/ 10

AUTHOR

JuniorDeveloper73

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE29m ago

Hermes Agent brings token streaming to Telegram

Nous Research announced that its open-source, self-hosted AI agent framework, Hermes Agent, now supports real-time token streaming when integrated with Telegram. By enabling streaming over the Telegram bot API, the conversational experience becomes significantly more fluid, responsive, and engaging.

NEWS1h ago

ElevenLabs opens voice-powered storefront during NYTechWeek

ElevenLabs has launched an interactive voice-powered storefront as part of NYTechWeek. This physical activation stands out from typical tech week events like panel discussions, allowing attendees to experience their voice AI technology in a real-world setting.

VIDEO1h ago

AlphaFold accelerates global scientific discovery

AlphaFold, developed by Google DeepMind, is an AI-powered protein structure prediction tool that has generated 3D models for over 200 million proteins. By making these highly accurate structures open and freely accessible through the AlphaFold Protein Structure Database, it has empowered more than 3 million researchers globally to fast-track breakthroughs in structural biology, genomics, and drug design.