Inference Speed Tests show M5 Max gains
A community benchmark repo compares identical 16-inch, 128GB M4 Max and M5 Max MacBook Pros on MLX inference workloads. The M5 Max delivers modest decode gains on short prompts, but much larger prompt-processing wins on a 21K-token context.
The headline here is not raw token generation, it's prefill. M5 Max looks like a real upgrade for local LLM workflows that spend time on long prompts, summarization, and agentic context loading.
- –Short-prompt generation improves by roughly 14% to 17%, which is solid but not transformative
- –Long-context prompt processing jumps by 2x to 3x, suggesting the M5 Max’s memory and accelerator changes matter most where Macs usually bottleneck
- –That makes the upgrade especially relevant for local RAG, document summarization, and large-context coding assistants
- –The repo is useful because it publishes TTFT, peak memory, and per-run breakdowns, but the results are still community benchmarks, not a controlled lab standard
- –For developers choosing between M4 Max and M5 Max, the M5 looks like a bandwidth and prefill win first, not a universal speed multiplier
DISCOVERED
56d ago
2026-04-01
PUBLISHED
56d ago
2026-04-01
RELEVANCE
AUTHOR
purealgo