M5 Pro tops M2 Max on LLM speed

// 87d agoBENCHMARK RESULT

M5 Pro tops M2 Max on LLM speed

A developer ran llama-bench on an Apple M5 Pro (18-core, 24GB) against an M2 Max (32GB) and M1 Pro (16GB) across three models. The M5 Pro's new Metal tensor API delivers 40%+ faster prompt processing than M2 Max while matching it in text generation throughput.

// ANALYSIS

The M5 Pro is a sleeper pick for local LLM inference — the tensor API support changes the math significantly compared to prior Apple Silicon generations.

–Prompt processing (pp512) on the M5 Pro is dramatically faster: 1727 t/s vs 1224 on M2 Max for GPT-OSS 20B MXFP4 (~41% improvement), and 808 vs 554 t/s for Qwen3.5-9B (~46% improvement)
–Text generation (tg128) is roughly comparable for dense models (30-31 t/s on Qwen 9B), but M5 Pro pulls ahead on MoE: 54 vs 42 t/s for Qwen3.5-35B-A3B
–The M5 Pro achieves this with 18 GPU cores vs the M2 Max's 38 — the tensor API hardware acceleration more than compensates for the core count disadvantage
–The M5 Pro has 8GB less RAM than the tested M2 Max (24GB vs 32GB), which limits model selection — the M5 Pro Max with 48-64GB would be a significantly different story
–M1 Pro users on 16GB are particularly RAM-constrained; the jump to 24GB on even the base M5 Pro opens up MoE models that couldn't fit before

// TAGS

apple-m5-prollminferencebenchmarkedge-aiopen-source

DISCOVERED

87d ago

2026-03-14

PUBLISHED

90d ago

2026-03-12

RELEVANCE

7/ 10

AUTHOR

Fit-Later-389

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS19m ago

Claude Fable 5 tops 5.5 in data analysis

In a recent post on X, user Theo expressed intense enthusiasm about the data analysis capabilities of an AI model called Fable. By stating it is "WAY better than 5.5," the user implies a significant generational leap in performance over what is likely a major foundational model, suggesting Fable is exceptionally well-suited for complex data tasks.

MODEL51m ago

Claude Fable 5 launch sparks massive developer backlash

Anthropic's Claude Fable 5 launch faces severe developer backlash over aggressive safety restrictions, high pricing, and a forced 30-day data retention policy. The model silently routes chemistry, biology, and cybersecurity requests to the older Opus 4.8 model, frustrating users with opaque downgrades and anti-distillation blocks.

MODEL52m ago

Designers praise Claude Fable 5 landing pages

Educator and designer Meng To highlighted Claude Fable 5's capability for creating landing pages on X, calling the model "a monster" for the task. Released in June 2026, Claude Fable 5 is Anthropic's latest Mythos-class AI model, featuring a 1-million-token context window, a 128,000-token output capacity, and advanced reasoning for long-horizon agentic workflows, making it highly effective for complex design and front-end code generation tasks.