OPEN_SOURCE ↗
YT · YOUTUBE// 29d agoBENCHMARK RESULT
MacBook Neo LLM benchmarks stress-test MLX on A18 Pro
Apple's new $599 MacBook Neo — the first Mac with an A18 Pro chip — gets put through local LLM benchmark testing using the MLX inference framework, with speculative decoding experiments to push throughput limits. The test probes how the A18 Pro's 60 GB/s memory bandwidth holds up against pricier M-series Macs for on-device AI workloads.
// ANALYSIS
The MacBook Neo is Apple's boldest bet that cheap hardware plus a strong Neural Engine can democratize local AI — but the memory bandwidth numbers tell a more complicated story.
- –A18 Pro's 60 GB/s bandwidth is actually *lower* than the M1 Mac Mini (~68 GB/s) and well below M4 MacBook Air (120 GB/s), making token generation the key bottleneck for any model above 4B parameters
- –Speculative decoding is MLX's workaround: pairing a small draft model with a large target model can yield 20–50% throughput gains, partially compensating for bandwidth constraints
- –MLX v0.31.1 ships a NumPy-compatible unified-memory stack that hits zero-copy CPU/GPU transfers — meaning the A18 Pro's 16-core Neural Engine is fully in the loop without memory overhead
- –The Neo's $599 price point is the real story: if MLX + speculative decoding can push 7B models to usable speeds, this is the first genuinely affordable local LLM machine from Apple
- –Apple published its own M5 MLX benchmarks (3.3–4.1x prefill speedup over M4), suggesting the Neural Engine path is the right bet — but the Neo needs that research to trickle down
// TAGS
mlxinferencebenchmarkedge-aiopen-sourcellm
DISCOVERED
29d ago
2026-03-14
PUBLISHED
29d ago
2026-03-14
RELEVANCE
7/ 10
AUTHOR
Bijan Bowen