BACK_TO_FEEDAICRIER_2
MacBook Neo LLM benchmarks stress-test MLX on A18 Pro
OPEN_SOURCE ↗
YT · YOUTUBE// 29d agoBENCHMARK RESULT

MacBook Neo LLM benchmarks stress-test MLX on A18 Pro

Apple's new $599 MacBook Neo — the first Mac with an A18 Pro chip — gets put through local LLM benchmark testing using the MLX inference framework, with speculative decoding experiments to push throughput limits. The test probes how the A18 Pro's 60 GB/s memory bandwidth holds up against pricier M-series Macs for on-device AI workloads.

// ANALYSIS

The MacBook Neo is Apple's boldest bet that cheap hardware plus a strong Neural Engine can democratize local AI — but the memory bandwidth numbers tell a more complicated story.

  • A18 Pro's 60 GB/s bandwidth is actually *lower* than the M1 Mac Mini (~68 GB/s) and well below M4 MacBook Air (120 GB/s), making token generation the key bottleneck for any model above 4B parameters
  • Speculative decoding is MLX's workaround: pairing a small draft model with a large target model can yield 20–50% throughput gains, partially compensating for bandwidth constraints
  • MLX v0.31.1 ships a NumPy-compatible unified-memory stack that hits zero-copy CPU/GPU transfers — meaning the A18 Pro's 16-core Neural Engine is fully in the loop without memory overhead
  • The Neo's $599 price point is the real story: if MLX + speculative decoding can push 7B models to usable speeds, this is the first genuinely affordable local LLM machine from Apple
  • Apple published its own M5 MLX benchmarks (3.3–4.1x prefill speedup over M4), suggesting the Neural Engine path is the right bet — but the Neo needs that research to trickle down
// TAGS
mlxinferencebenchmarkedge-aiopen-sourcellm

DISCOVERED

29d ago

2026-03-14

PUBLISHED

29d ago

2026-03-14

RELEVANCE

7/ 10

AUTHOR

Bijan Bowen