Exo Stretches Across Two 128GB MacBook Pros
The post asks whether a pair of 128GB MacBook Pros can be treated like one bigger local inference box for very large models, with MLX or Exo handling the sharding. The appeal is straightforward: if the combined memory can hold a model that exceeds one machine, the setup could be useful even if it is slow, and the second Mac also doubles as a travel-friendly backup and display.
Exo’s README says it can connect devices into an AI cluster, split models across them with ring-style partitioning, and run models larger than a single device, but it also explicitly calls the software experimental. MLX-LM officially supports distributed inference and fine-tuning via `mx.distributed`, so the underlying MLX stack is capable, but that is not the same thing as seamless memory pooling across Macs. Inference-wise, a 2x128GB setup should help you load larger models than a single machine can, but sharding overhead, KV cache growth, and network or Thunderbolt latency will still decide whether it feels usable.
DISCOVERED
22d ago
2026-03-21
PUBLISHED
22d ago
2026-03-21
RELEVANCE
AUTHOR
alcyonex