Inference Speed Tests show M5 Max gains

// 57d agoBENCHMARK RESULT

Inference Speed Tests show M5 Max gains

A community benchmark repo compares identical 16-inch, 128GB M4 Max and M5 Max MacBook Pros on MLX inference workloads. The M5 Max delivers modest decode gains on short prompts, but much larger prompt-processing wins on a 21K-token context.

// ANALYSIS

The headline here is not raw token generation, it's prefill. M5 Max looks like a real upgrade for local LLM workflows that spend time on long prompts, summarization, and agentic context loading.

–Short-prompt generation improves by roughly 14% to 17%, which is solid but not transformative
–Long-context prompt processing jumps by 2x to 3x, suggesting the M5 Max’s memory and accelerator changes matter most where Macs usually bottleneck
–That makes the upgrade especially relevant for local RAG, document summarization, and large-context coding assistants
–The repo is useful because it publishes TTFT, peak memory, and per-run breakdowns, but the results are still community benchmarks, not a controlled lab standard
–For developers choosing between M4 Max and M5 Max, the M5 looks like a bandwidth and prefill win first, not a universal speed multiplier

// TAGS

llminferencebenchmarkopen-sourceinference-speed-tests

DISCOVERED

57d ago

2026-04-01

PUBLISHED

57d ago

2026-04-01

RELEVANCE

8/ 10

AUTHOR

purealgo

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE30m ago

Grok Build widens access, adds subagents

xAI’s Grok Build is an early-beta terminal coding agent with plan-review-approve flows, parallel subagents, worktree isolation, and support for plugins, hooks, skills, and MCP. The latest improvements make it feel less like a demo and more like xAI’s bid to compete seriously in the AI coding CLI race.

MODEL37m ago

Krea 2 lands on Replicate

Krea 2 is now available on Replicate, giving developers access to Krea's style-first image model outside the Krea app. It emphasizes aesthetic diversity, style control, and reference-driven creative workflows.

MODEL1h ago

ElevenLabs launches Music v2 for creators

ElevenLabs has released Music v2, a new music generation model that improves vocals, instrumentation, arrangement, and multilingual output. The model supports longer, section-by-section composition, inpainting to regenerate specific parts of a track, and more complex shifts within a song without losing coherence. It powers ElevenMusic and ElevenCreative now, with ElevenAPI access coming soon, and is trained on licensed data for commercial use.