YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Qwen3.6-35B-A3B smokes Qwen3.5 locally

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Qwen3.6-35B-A3B smokes Qwen3.5 locally
OPEN LINK ↗
// 45d agoBENCHMARK RESULT

Qwen3.6-35B-A3B smokes Qwen3.5 locally

A Reddit benchmark on a 9,123-token prompt shows Qwen3.6-35B-A3B-8bit running far faster than Qwen3.5-397B-A17B-MLX-8bit on both LM Studio and oMLX. The smaller sparse model also cuts time-to-first-token from tens of seconds to under four seconds.

// ANALYSIS

The takeaway is blunt: for local MLX users, the newer sparse 35B model looks much more practical than the much larger 397B class model, even before you factor in memory pressure and responsiveness.

  • In LM Studio, Qwen3.6-35B-A3B hits 70 t/s versus 25 t/s for Qwen3.5-397B-A17B, with time-to-first-token dropping from 43.41s to 3.8s
  • In oMLX, it still holds a wide lead: 55 t/s versus 21 t/s, and 3.74s TTFT versus 25.6s
  • Prompt processing is the real standout: roughly 2400 t/s on Qwen3.6 versus 210-356 t/s on the older model
  • This is a single-user benchmark, so it is not a controlled eval, but it strongly suggests the newer model is the better local default for long prompts
  • For Mac/MLX deployments, speed at prompt ingest and first token matters more than headline parameter count, and this result favors the smaller sparse architecture
// TAGS
qwen3.6-35b-a3bqwen3.5-397b-a17bllminferencebenchmarkopen-weights

DISCOVERED

45d ago

2026-04-18

PUBLISHED

45d ago

2026-04-18

RELEVANCE

8/ 10

AUTHOR

Turbulent_Pin7635