BACK_TO_FEEDAICRIER_2
Qwen3.5-35B-A3B tops 63 t/s on M2 Ultra
OPEN_SOURCE ↗
REDDIT · REDDIT// 1h agoBENCHMARK RESULT

Qwen3.5-35B-A3B tops 63 t/s on M2 Ultra

Reddit benchmark on a Mac Studio M2 Ultra 64GB shows Qwen3.5-35B-A3B Q8_K_XL hitting 1,734 t/s prefill at 10,240 tokens, 1,552 t/s at 16,384 tokens, and 63 t/s generate, averaged over three runs. It is a narrow local-inference datapoint, but it suggests the model is very viable on high-memory Apple Silicon.

// ANALYSIS

This is a strong showing for a 35B-class MoE model on consumer-ish hardware, especially if you care about interactive local use more than leaderboard bragging rights.

  • The active-parameter design lets a large model fit and run fast enough on 64GB unified memory without immediately collapsing into tiny quants.
  • Prefill stays high even at 16K context, which matters more for long prompts and codebases than the raw generate number.
  • Q8_K_XL looks like a sensible sweet spot here: enough fidelity to keep the model interesting, without the memory hit of heavier formats.
  • Treat it as a hardware/backend benchmark, not a universal model ranking; no task suite, prompts, or quality scoring were reported.
// TAGS
llmbenchmarkinferenceopen-sourceqwen3-5-35b-a3b

DISCOVERED

1h ago

2026-04-17

PUBLISHED

3h ago

2026-04-17

RELEVANCE

8/ 10

AUTHOR

channingao