OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoINFRASTRUCTURE
M2 Max MacBook Trades Speed for RAM
This is a memory-first local LLM machine, not a speed demon: the 96GB M2 Max can fit models and contexts that 48GB dual-3090 setups may struggle with. If your workload fits comfortably on 2 x RTX 3090, the PC should be noticeably faster for both prompt processing and token generation.
// ANALYSIS
The short version: buy the Mac for capacity and portability, buy the 2 x 3090 rig for raw inference speed. For LocalLLaMA-style use, that trade is usually lopsided unless you specifically need a laptop.
- –Apple’s M2 Max tops out at 96GB unified memory and 400GB/s bandwidth, which is enough to load bigger models, longer contexts, and some MoE setups that won’t fit cleanly on dual 24GB cards.
- –Community benchmarks on this chip class show decent throughput, but not “desktop GPU killer” throughput; speed swings heavily with quantization, context length, and runtime.
- –A dual 3090 box still wins where it matters for daily chat latency: when the model fits in VRAM, prompt processing and decode speed are substantially better.
- –The real advantage of the Mac is not speed, it’s getting a huge memory pool in a portable machine with low hassle.
- –At this price, I would only choose it over 2 x 3090 if portability, battery, and unified-memory headroom matter more than tokens per second.
// TAGS
macbook-prollminferencegpubenchmark
DISCOVERED
4h ago
2026-04-27
PUBLISHED
7h ago
2026-04-27
RELEVANCE
8/ 10
AUTHOR
GravyPoo