OPEN_SOURCE ↗
REDDIT · REDDIT// 3h agoBENCHMARK RESULT
Qwen3.6-35B-A3B smokes Qwen3.5 locally
A Reddit benchmark on a 9,123-token prompt shows Qwen3.6-35B-A3B-8bit running far faster than Qwen3.5-397B-A17B-MLX-8bit on both LM Studio and oMLX. The smaller sparse model also cuts time-to-first-token from tens of seconds to under four seconds.
// ANALYSIS
The takeaway is blunt: for local MLX users, the newer sparse 35B model looks much more practical than the much larger 397B class model, even before you factor in memory pressure and responsiveness.
- –In LM Studio, Qwen3.6-35B-A3B hits 70 t/s versus 25 t/s for Qwen3.5-397B-A17B, with time-to-first-token dropping from 43.41s to 3.8s
- –In oMLX, it still holds a wide lead: 55 t/s versus 21 t/s, and 3.74s TTFT versus 25.6s
- –Prompt processing is the real standout: roughly 2400 t/s on Qwen3.6 versus 210-356 t/s on the older model
- –This is a single-user benchmark, so it is not a controlled eval, but it strongly suggests the newer model is the better local default for long prompts
- –For Mac/MLX deployments, speed at prompt ingest and first token matters more than headline parameter count, and this result favors the smaller sparse architecture
// TAGS
qwen3.6-35b-a3bqwen3.5-397b-a17bllminferencebenchmarkopen-weights
DISCOVERED
3h ago
2026-04-18
PUBLISHED
3h ago
2026-04-18
RELEVANCE
8/ 10
AUTHOR
Turbulent_Pin7635