M5 Max LLM tests hit 72.8 tok/s

// 67d agoBENCHMARK RESULT

M5 Max LLM tests hit 72.8 tok/s

This self-reported benchmark from a new Apple M5 Max 128GB machine shows local LLM inference is now very usable, with DeepSeek-R1 8B topping the chart at 72.8 tok/s. The most interesting result is that runtime choice matters almost as much as model size, with Qwen 3.5 27B running far faster in MLX than in llama.cpp.

// ANALYSIS

This reads less like a chip brag and more like a preview of how Apple Silicon local AI workflows will actually be built: tiered models, runtime-specific routing, and memory-bandwidth-aware model choice.

–The 614 GB/s unified memory ceiling is clearly driving throughput; the results scale closely with model size, which is exactly what you want to see from a bandwidth-bound workload.
–MLX’s 31.6 tok/s on Qwen 3.5 27B versus llama.cpp’s 16.5 tok/s is the headline technical surprise, and it reinforces how much framework optimization still matters on Apple Silicon.
–DeepSeek-R1 8B looks like the practical everyday model here: fast enough for interactive use, but still capable enough to keep the assistant feeling smart.
–The 72B result is slow but viable, which makes a semantic router feel less like a hobby project and more like the right product pattern for local AI.
–Because these are self-run benchmarks, the exact numbers will vary with prompt mix, context length, and software revisions, but the overall ranking is still very believable.

// TAGS

benchmarkllminferencegpum5-max

DISCOVERED

67d ago

2026-03-21

PUBLISHED

68d ago

2026-03-21

RELEVANCE

8/ 10

AUTHOR

affenhoden

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS46m ago

Dev lets Claude trade BTC overnight, nets $95 profit

A developer gave Claude a $20 budget to autonomously script and execute Bitcoin trades overnight, waking up to a functional trading bot and a $95 profit across five trades.

OPEN SOURCE1h ago

Plannotator 0.19.24 adds Amp support and configurable storage

Plannotator 0.19.24 is a substantial release that expands the tool beyond Claude Code with native Amp support, adds a `PLANNOTATOR_DATA_DIR` override so users can move the default `~/.plannotator` data directory, introduces Auto Mode in the permission selector for newer Claude Code versions, and fixes a Pi approval crash after plan acceptance. The update folds multiple stacked PRs into one release and pushes the project further toward a multi-agent review layer rather than a single-agent hook utility.

NEWS2h ago

Aaronson says AI turns mathematicians into curators

Scott Aaronson says recent AI results in mathematics, including a GPT-5.5 Pro solution to Erdős’s Unit Distance Problem, suggest humans may increasingly focus on choosing questions and interpreting model outputs. He extends the argument to AI-written fiction and the Vatican’s AI encyclical as signs of a broader cultural shift.