M5 Max slashes prefill, hits 111W

// 71d agoBENCHMARK RESULT

M5 Max slashes prefill, hits 111W

A Reddit benchmark of Gemma 3 27B MLX in LM Studio shows the M5 Max dramatically improving long-prompt prefill versus M4 Max, cutting time to first token from about 89.8 seconds to 24.4 seconds on a 19,761-token prompt. The tradeoff is much higher peak power, rising from under 70W on M4 Max to under 115W on M5 Max, which raises thermal throttling concerns. Overall generation speed barely moves, so the win is mostly about faster prompt processing rather than a huge end-to-end throughput jump.

// ANALYSIS

Hot take: this is a strong local-LLM upgrade if your pain point is long-context prefill, but it is not a free lunch.

–The headline number is real: prefill is about 3.7x faster on the same 19K-token workload.
–The broader workflow gain is much smaller because generation speed only improves marginally.
–Peak power is the catch here; sustained long runs may hit thermals sooner on the M5 Max.
–For people doing local inference in LM Studio, this looks like a “faster start, hotter chip” tradeoff.

// TAGS

apple-siliconm5-maxmlxlm-studiogemma-3local-llmbenchmarkthermal-throttlingpower-draw

DISCOVERED

71d ago

2026-03-18

PUBLISHED

71d ago

2026-03-17

RELEVANCE

8/ 10

AUTHOR

M5_Maxxx

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS1h ago

Dev lets Claude trade BTC overnight, nets $95 profit

A developer gave Claude a $20 budget to autonomously script and execute Bitcoin trades overnight, waking up to a functional trading bot and a $95 profit across five trades.

OPEN SOURCE2h ago

Plannotator 0.19.24 adds Amp support and configurable storage

Plannotator 0.19.24 is a substantial release that expands the tool beyond Claude Code with native Amp support, adds a `PLANNOTATOR_DATA_DIR` override so users can move the default `~/.plannotator` data directory, introduces Auto Mode in the permission selector for newer Claude Code versions, and fixes a Pi approval crash after plan acceptance. The update folds multiple stacked PRs into one release and pushes the project further toward a multi-agent review layer rather than a single-agent hook utility.

NEWS2h ago

Aaronson says AI turns mathematicians into curators

Scott Aaronson says recent AI results in mathematics, including a GPT-5.5 Pro solution to Erdős’s Unit Distance Problem, suggest humans may increasingly focus on choosing questions and interpreting model outputs. He extends the argument to AI-written fiction and the Vatican’s AI encyclical as signs of a broader cultural shift.