REDDIT · REDDIT// 22d agoINFRASTRUCTURE

Exo Stretches Across Two 128GB MacBook Pros

The post asks whether a pair of 128GB MacBook Pros can be treated like one bigger local inference box for very large models, with MLX or Exo handling the sharding. The appeal is straightforward: if the combined memory can hold a model that exceeds one machine, the setup could be useful even if it is slow, and the second Mac also doubles as a travel-friendly backup and display.

// ANALYSIS

Exo’s README says it can connect devices into an AI cluster, split models across them with ring-style partitioning, and run models larger than a single device, but it also explicitly calls the software experimental. MLX-LM officially supports distributed inference and fine-tuning via `mx.distributed`, so the underlying MLX stack is capable, but that is not the same thing as seamless memory pooling across Macs. Inference-wise, a 2x128GB setup should help you load larger models than a single machine can, but sharding overhead, KV cache growth, and network or Thunderbolt latency will still decide whether it feels usable.

// TAGS

macosapple-siliconmlxexolocal-llmdistributed-inferenceunified-memoryqwen

DISCOVERED

22d ago

2026-03-21

PUBLISHED

22d ago

2026-03-21

RELEVANCE

5/ 10

AUTHOR

alcyonex