BACK_TO_FEEDAICRIER_2
Qwen3.6-35B-A3B smokes Qwen3.5 locally
OPEN_SOURCE ↗
REDDIT · REDDIT// 3h agoBENCHMARK RESULT

Qwen3.6-35B-A3B smokes Qwen3.5 locally

A Reddit benchmark on a 9,123-token prompt shows Qwen3.6-35B-A3B-8bit running far faster than Qwen3.5-397B-A17B-MLX-8bit on both LM Studio and oMLX. The smaller sparse model also cuts time-to-first-token from tens of seconds to under four seconds.

// ANALYSIS

The takeaway is blunt: for local MLX users, the newer sparse 35B model looks much more practical than the much larger 397B class model, even before you factor in memory pressure and responsiveness.

  • In LM Studio, Qwen3.6-35B-A3B hits 70 t/s versus 25 t/s for Qwen3.5-397B-A17B, with time-to-first-token dropping from 43.41s to 3.8s
  • In oMLX, it still holds a wide lead: 55 t/s versus 21 t/s, and 3.74s TTFT versus 25.6s
  • Prompt processing is the real standout: roughly 2400 t/s on Qwen3.6 versus 210-356 t/s on the older model
  • This is a single-user benchmark, so it is not a controlled eval, but it strongly suggests the newer model is the better local default for long prompts
  • For Mac/MLX deployments, speed at prompt ingest and first token matters more than headline parameter count, and this result favors the smaller sparse architecture
// TAGS
qwen3.6-35b-a3bqwen3.5-397b-a17bllminferencebenchmarkopen-weights

DISCOVERED

3h ago

2026-04-18

PUBLISHED

3h ago

2026-04-18

RELEVANCE

8/ 10

AUTHOR

Turbulent_Pin7635