exo runs trillion-parameter model on Macs

// 2h agoINFRASTRUCTURE

exo runs trillion-parameter model on Macs

A demonstration by developer @noisyb0y1 shows a trillion-parameter Mixture-of-Experts (MoE) model running at 25–28 tokens per second on a local cluster of four Apple Mac Studios. The setup utilizes the open-source exo framework built on Apple's MLX backend, daisy-chaining the Macs via Thunderbolt to pool their unified memory.

// ANALYSIS

Local distributed inference is reaching a tipping point where consumer-grade hardware clusters can challenge cloud-based enterprise GPUs for running massive, trillion-parameter sparse models.

–Apple Silicon Advantage: Apple's unified memory architecture is the secret weapon for local AI, making large memory pools affordable compared to discrete enterprise GPUs.
–Thunderbolt/RDMA Connectivity: Daisy-chaining Macs with low-latency interconnects removes traditional local networking bottlenecks.
–Sparse MoE Architecture: Trillion-parameter models are only viable locally because they are sparse, activating only a fraction of parameters at runtime.
–Enterprise Disruption: As local frameworks mature, smaller enterprises can run private, large-scale LLMs without ongoing cloud GPU subscriptions.

// TAGS

local-aiapple-siliconmac-studiodistributed-inferenceexomlxllm

DISCOVERED

2h ago

2026-06-19

PUBLISHED

2h ago

2026-06-19

RELEVANCE

8/ 10

AUTHOR

Av1dlive

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL31m ago

Unsloth releases GLM-5.2 GGUF quantizations

Unsloth has released optimized Dynamic GGUF quantizations for the 744B GLM-5.2 model, shrinking its footprint from 1.51TB to 238GB at 2-bit. This release enables developers to run a frontier-class open-weights model locally on high-end consumer hardware like a 256GB Mac Studio while retaining 82% of the original model's accuracy.

UPDATE36m ago

OpenAI Codex lets users teach workflows

James Sun, a product staff member at OpenAI, announced a new capability in the Codex desktop application that enables users to teach the agent custom skills and workflows in a manner similar to instructing a human coworker. This feature supports complex workflows involving computer and browser use, which run significantly faster the second time around once the agent has observed the initial demonstration of the task.

UPDATE1h ago

Basedash adds charting, AI memory, access controls

Basedash has released its latest weekly update featuring deeper chart appearance customizations, smarter natural-language AI chat with improved memory, and more granular role-based access controls. The company also announced an upcoming focus on data traceability features for its AI-native business intelligence platform.