YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

M5 Pro tops M2 Max on LLM speed

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

M5 Pro tops M2 Max on LLM speed
OPEN LINK ↗
// 76d agoBENCHMARK RESULT

M5 Pro tops M2 Max on LLM speed

A developer ran llama-bench on an Apple M5 Pro (18-core, 24GB) against an M2 Max (32GB) and M1 Pro (16GB) across three models. The M5 Pro's new Metal tensor API delivers 40%+ faster prompt processing than M2 Max while matching it in text generation throughput.

// ANALYSIS

The M5 Pro is a sleeper pick for local LLM inference — the tensor API support changes the math significantly compared to prior Apple Silicon generations.

  • Prompt processing (pp512) on the M5 Pro is dramatically faster: 1727 t/s vs 1224 on M2 Max for GPT-OSS 20B MXFP4 (~41% improvement), and 808 vs 554 t/s for Qwen3.5-9B (~46% improvement)
  • Text generation (tg128) is roughly comparable for dense models (30-31 t/s on Qwen 9B), but M5 Pro pulls ahead on MoE: 54 vs 42 t/s for Qwen3.5-35B-A3B
  • The M5 Pro achieves this with 18 GPU cores vs the M2 Max's 38 — the tensor API hardware acceleration more than compensates for the core count disadvantage
  • The M5 Pro has 8GB less RAM than the tested M2 Max (24GB vs 32GB), which limits model selection — the M5 Pro Max with 48-64GB would be a significantly different story
  • M1 Pro users on 16GB are particularly RAM-constrained; the jump to 24GB on even the base M5 Pro opens up MoE models that couldn't fit before
// TAGS
apple-m5-prollminferencebenchmarkedge-aiopen-source

DISCOVERED

76d ago

2026-03-14

PUBLISHED

78d ago

2026-03-12

RELEVANCE

7/ 10

AUTHOR

Fit-Later-389