BACK_TO_FEEDAICRIER_2
M5 Pro tops M2 Max on LLM speed
OPEN_SOURCE ↗
REDDIT · REDDIT// 29d agoBENCHMARK RESULT

M5 Pro tops M2 Max on LLM speed

A developer ran llama-bench on an Apple M5 Pro (18-core, 24GB) against an M2 Max (32GB) and M1 Pro (16GB) across three models. The M5 Pro's new Metal tensor API delivers 40%+ faster prompt processing than M2 Max while matching it in text generation throughput.

// ANALYSIS

The M5 Pro is a sleeper pick for local LLM inference — the tensor API support changes the math significantly compared to prior Apple Silicon generations.

  • Prompt processing (pp512) on the M5 Pro is dramatically faster: 1727 t/s vs 1224 on M2 Max for GPT-OSS 20B MXFP4 (~41% improvement), and 808 vs 554 t/s for Qwen3.5-9B (~46% improvement)
  • Text generation (tg128) is roughly comparable for dense models (30-31 t/s on Qwen 9B), but M5 Pro pulls ahead on MoE: 54 vs 42 t/s for Qwen3.5-35B-A3B
  • The M5 Pro achieves this with 18 GPU cores vs the M2 Max's 38 — the tensor API hardware acceleration more than compensates for the core count disadvantage
  • The M5 Pro has 8GB less RAM than the tested M2 Max (24GB vs 32GB), which limits model selection — the M5 Pro Max with 48-64GB would be a significantly different story
  • M1 Pro users on 16GB are particularly RAM-constrained; the jump to 24GB on even the base M5 Pro opens up MoE models that couldn't fit before
// TAGS
apple-m5-prollminferencebenchmarkedge-aiopen-source

DISCOVERED

29d ago

2026-03-14

PUBLISHED

31d ago

2026-03-12

RELEVANCE

7/ 10

AUTHOR

Fit-Later-389