OPEN_SOURCE ↗
REDDIT · REDDIT// 11d agoBENCHMARK RESULT
Inference Speed Tests show M5 Max gains
A community benchmark repo compares identical 16-inch, 128GB M4 Max and M5 Max MacBook Pros on MLX inference workloads. The M5 Max delivers modest decode gains on short prompts, but much larger prompt-processing wins on a 21K-token context.
// ANALYSIS
The headline here is not raw token generation, it's prefill. M5 Max looks like a real upgrade for local LLM workflows that spend time on long prompts, summarization, and agentic context loading.
- –Short-prompt generation improves by roughly 14% to 17%, which is solid but not transformative
- –Long-context prompt processing jumps by 2x to 3x, suggesting the M5 Max’s memory and accelerator changes matter most where Macs usually bottleneck
- –That makes the upgrade especially relevant for local RAG, document summarization, and large-context coding assistants
- –The repo is useful because it publishes TTFT, peak memory, and per-run breakdowns, but the results are still community benchmarks, not a controlled lab standard
- –For developers choosing between M4 Max and M5 Max, the M5 looks like a bandwidth and prefill win first, not a universal speed multiplier
// TAGS
llminferencebenchmarkopen-sourceinference-speed-tests
DISCOVERED
11d ago
2026-04-01
PUBLISHED
11d ago
2026-04-01
RELEVANCE
8/ 10
AUTHOR
purealgo