M5 Max MacBook Pro nears RTX laptop LLMs

// 75d agoBENCHMARK RESULT

M5 Max MacBook Pro nears RTX laptop LLMs

Reddit’s LocalLLaMA community is dissecting Hardware Canucks’ first M5 Max laptop tests, which put Apple’s new MacBook Pro roughly alongside RTX 5080 and 5090 laptops on a small LM Studio DeepSeek R1 14B run while using far less power. The bigger caveat is that the benchmark set is still thin and skips the prompt-processing and larger-model numbers that matter most for serious local inference.

// ANALYSIS

The M5 Max looks like a real contender for on-device LLM work, but these numbers are still more teaser than verdict. Apple’s real advantage is not winning tiny-model token races, it is keeping bigger models usable once mobile GPU VRAM runs out.

–The clearest early datapoint is about 59 tok/s on DeepSeek R1 14B in LM Studio, close to laptop RTX 5080 and 5090 results in the same roundup.
–Reddit commenters focused on the M5 Max’s 614GB/s unified memory bandwidth, arguing that Apple closes the gap when models no longer fit cleanly inside 24GB to 32GB mobile GPU memory.
–The missing data is prompt processing, long context, and larger 30B+ or MoE workloads, which are usually more revealing than a single small-model decode test.
–For AI developers, the interesting angle is efficiency and memory headroom: a quieter, battery-friendly laptop that can run larger local models may matter more than topping raw tokens-per-second charts.

// TAGS

macbook-prollmbenchmarkinferencegpu

DISCOVERED

75d ago

2026-03-14

PUBLISHED

79d ago

2026-03-09

RELEVANCE

7/ 10

AUTHOR

themixtergames

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS24m ago

Anthropic readies Opus 4.8 release amid leaks

Rumors of an imminent Claude Opus 4.8 launch swirl as model slugs appear in staging and OpenAI drops stealth updates. The anticipated release signals a pivot toward deeper agentic capabilities and integrated developer workflows.

NEWS32m ago

Pocock: Fewer test seams boost agents

TypeScript authority Matt Pocock argues that minimizing test seams is the key to unlocking AI agent productivity. By focusing on "single-seam" problems like compilers and pure libraries, developers can reduce the architectural "context bounce" that often derails LLM-led refactoring and autonomous coding tasks.

BENCHMARK52m ago

Gemma 4 31B stalls on MacBook M5 Max

Google's Gemma 4 31B model exhibits a 42-second initial latency on Apple M5 Max hardware due to a Flash Attention implementation bug. The bottleneck highlights a critical software-hardware mismatch in the latest hybrid attention architectures.