OPEN_SOURCE ↗
REDDIT · REDDIT// 25d agoBENCHMARK RESULT
M5 Max slashes prefill, hits 111W
A Reddit benchmark of Gemma 3 27B MLX in LM Studio shows the M5 Max dramatically improving long-prompt prefill versus M4 Max, cutting time to first token from about 89.8 seconds to 24.4 seconds on a 19,761-token prompt. The tradeoff is much higher peak power, rising from under 70W on M4 Max to under 115W on M5 Max, which raises thermal throttling concerns. Overall generation speed barely moves, so the win is mostly about faster prompt processing rather than a huge end-to-end throughput jump.
// ANALYSIS
Hot take: this is a strong local-LLM upgrade if your pain point is long-context prefill, but it is not a free lunch.
- –The headline number is real: prefill is about 3.7x faster on the same 19K-token workload.
- –The broader workflow gain is much smaller because generation speed only improves marginally.
- –Peak power is the catch here; sustained long runs may hit thermals sooner on the M5 Max.
- –For people doing local inference in LM Studio, this looks like a “faster start, hotter chip” tradeoff.
// TAGS
apple-siliconm5-maxmlxlm-studiogemma-3local-llmbenchmarkthermal-throttlingpower-draw
DISCOVERED
25d ago
2026-03-18
PUBLISHED
25d ago
2026-03-17
RELEVANCE
8/ 10
AUTHOR
M5_Maxxx