OPEN_SOURCE ↗
REDDIT · REDDIT// 29d agoBENCHMARK RESULT
M5 Pro tops M2 Max on LLM speed
A developer ran llama-bench on an Apple M5 Pro (18-core, 24GB) against an M2 Max (32GB) and M1 Pro (16GB) across three models. The M5 Pro's new Metal tensor API delivers 40%+ faster prompt processing than M2 Max while matching it in text generation throughput.
// ANALYSIS
The M5 Pro is a sleeper pick for local LLM inference — the tensor API support changes the math significantly compared to prior Apple Silicon generations.
- –Prompt processing (pp512) on the M5 Pro is dramatically faster: 1727 t/s vs 1224 on M2 Max for GPT-OSS 20B MXFP4 (~41% improvement), and 808 vs 554 t/s for Qwen3.5-9B (~46% improvement)
- –Text generation (tg128) is roughly comparable for dense models (30-31 t/s on Qwen 9B), but M5 Pro pulls ahead on MoE: 54 vs 42 t/s for Qwen3.5-35B-A3B
- –The M5 Pro achieves this with 18 GPU cores vs the M2 Max's 38 — the tensor API hardware acceleration more than compensates for the core count disadvantage
- –The M5 Pro has 8GB less RAM than the tested M2 Max (24GB vs 32GB), which limits model selection — the M5 Pro Max with 48-64GB would be a significantly different story
- –M1 Pro users on 16GB are particularly RAM-constrained; the jump to 24GB on even the base M5 Pro opens up MoE models that couldn't fit before
// TAGS
apple-m5-prollminferencebenchmarkedge-aiopen-source
DISCOVERED
29d ago
2026-03-14
PUBLISHED
31d ago
2026-03-12
RELEVANCE
7/ 10
AUTHOR
Fit-Later-389