Qwen3.6-35B-A3B smokes Qwen3.5 locally

// 90d agoBENCHMARK RESULT

Qwen3.6-35B-A3B smokes Qwen3.5 locally

A Reddit benchmark on a 9,123-token prompt shows Qwen3.6-35B-A3B-8bit running far faster than Qwen3.5-397B-A17B-MLX-8bit on both LM Studio and oMLX. The smaller sparse model also cuts time-to-first-token from tens of seconds to under four seconds.

// ANALYSIS

The takeaway is blunt: for local MLX users, the newer sparse 35B model looks much more practical than the much larger 397B class model, even before you factor in memory pressure and responsiveness.

–In LM Studio, Qwen3.6-35B-A3B hits 70 t/s versus 25 t/s for Qwen3.5-397B-A17B, with time-to-first-token dropping from 43.41s to 3.8s
–In oMLX, it still holds a wide lead: 55 t/s versus 21 t/s, and 3.74s TTFT versus 25.6s
–Prompt processing is the real standout: roughly 2400 t/s on Qwen3.6 versus 210-356 t/s on the older model
–This is a single-user benchmark, so it is not a controlled eval, but it strongly suggests the newer model is the better local default for long prompts
–For Mac/MLX deployments, speed at prompt ingest and first token matters more than headline parameter count, and this result favors the smaller sparse architecture

// TAGS

qwen3.6-35b-a3bqwen3.5-397b-a17bllminferencebenchmarkopen-weights

DISCOVERED

90d ago

2026-04-18

PUBLISHED

90d ago

2026-04-18

RELEVANCE

8/ 10

AUTHOR

Turbulent_Pin7635

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

BENCHMARK17m ago

Runway Agent 2.0 tops Arc 1.0 benchmark

Runway detailed its engineering approach for Runway Agent 2.0, a conversational video generation and editing partner that topped Physion Labs' Arc 1.0 benchmark across all categories. The platform integrates media into a timeline interface, letting users iteratively transform briefs or performance data into cinematic video.

MODEL1h ago

Moonshot AI shares Kimi K3 pre-launch look

Ahead of the launch of their Kimi K3 large language model, the team at Chinese AI startup Moonshot AI shared a behind-the-scenes photo of their workspace. The post captures the excitement and high stakes surrounding the release, with team members expressing confidence that their office is a potential birthplace of Artificial General Intelligence (AGI).

NEWS1h ago

Claude Code praised as multi-model orchestrator

A user on X has highlighted Anthropic's Claude Code as the premier agentic harness for orchestrating other models and harnesses, specifically mentioning running GPT-5.6 Sol and Kimi K3. Although the user notes that Claude Code does not win in terms of pure coding performance and efficiency, they find its workflow management and coordination capabilities to be highly valuable for modern developer environments.