Ollama 0.19 boosts Apple Silicon with MLX
Ollama 0.19 rebuilds its Apple Silicon runtime on MLX, delivering a noticeable speedup for local inference on Macs. The release also adds NVFP4 support and smarter cache reuse, which should make coding agents and branching sessions feel much more responsive.
This is a real infrastructure upgrade, not a cosmetic release: Ollama is making Apple Silicon feel like a first-class local inference platform again, and the cache work may matter almost as much as the raw benchmark gains for agentic workflows.
- –MLX plus Apple’s GPU Neural Accelerators should cut both time-to-first-token and steady-state generation latency on newer Macs.
- –NVFP4 support narrows the gap between local testing and production-style inference formats, which is useful for teams comparing outputs across environments.
- –Cache snapshots, reuse across conversations, and smarter eviction are exactly the kind of changes that improve Claude Code-style branching loops.
- –The preview is aimed at bigger machines with 32GB+ unified memory, so the win is strongest for high-end Apple Silicon users.
- –Focusing on Qwen3.5-35B-A3B coding workloads signals that Ollama is optimizing for serious local coding agents, not just casual chat.
DISCOVERED
57d ago
2026-04-01
PUBLISHED
57d ago
2026-04-01
RELEVANCE
AUTHOR
[REDACTED]
