Hypura runs bigger models on Macs

// 80d agoOPENSOURCE RELEASE

Hypura runs bigger models on Macs

Hypura is a storage-tier-aware LLM inference scheduler for Apple Silicon that spreads tensors across GPU, RAM, and NVMe so models larger than local memory can still run. The open-source project is built on llama.cpp.

// ANALYSIS

This is smart systems work, not a miracle accelerator. Hypura treats NVMe as a legitimate third memory tier, which is exactly the sort of hack that makes local LLMs feel less boxed in on Apple Silicon.

–MoE models are the sweet spot because only a subset of experts fire per token, so dormant experts can live on NVMe and be fetched on demand.
–Dense models still pay the latency tax once FFN weights start streaming, so the win is feasibility more than raw throughput.
–The benchmark claims are deliberately practical: 2.2 tok/s on a 31 GB Mixtral and 0.3 tok/s on a 70B model are not fast, but they turn a hard OOM into something usable.
–The automatic placement planner, prefetch logic, and Ollama-compatible server hide the ugly parts of buffer sizing and tier assignment from users.
–For models that already fit in memory, the project claims zero overhead, which is exactly the kind of graceful fallback you want from a scheduler.

// TAGS

hypurallminferencegpuopen-sourceself-hostedcliapi

DISCOVERED

80d ago

2026-03-22

PUBLISHED

80d ago

2026-03-22

RELEVANCE

8/ 10

AUTHOR

tbaumer22

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL36m ago

Claude Fable 5 launch sparks massive developer backlash

Anthropic's Claude Fable 5 launch faces severe developer backlash over aggressive safety restrictions, high pricing, and a forced 30-day data retention policy. The model silently routes chemistry, biology, and cybersecurity requests to the older Opus 4.8 model, frustrating users with opaque downgrades and anti-distillation blocks.

MODEL36m ago

Designers praise Claude Fable 5 landing pages

Educator and designer Meng To highlighted Claude Fable 5's capability for creating landing pages on X, calling the model "a monster" for the task. Released in June 2026, Claude Fable 5 is Anthropic's latest Mythos-class AI model, featuring a 1-million-token context window, a 128,000-token output capacity, and advanced reasoning for long-horizon agentic workflows, making it highly effective for complex design and front-end code generation tasks.

MODEL1h ago

Claude Fable 5 hits Google Cloud

Anthropic's new Mythos-class frontier AI model, Claude Fable 5, is now generally available on Google Cloud's Agent Platform (Vertex AI). Designed for complex, long-horizon reasoning and autonomous workflows, Fable 5 is built for tasks such as software engineering, deep research, and multi-day agentic execution, featuring built-in safety guardrails that automatically redirect sensitive queries to Claude Opus 4.8.