OPEN_SOURCE ↗
PH · PRODUCT_HUNT// 10d agoINFRASTRUCTURE
Ollama 0.19 boosts Apple Silicon with MLX
Ollama 0.19 rebuilds its Apple Silicon runtime on MLX, delivering a noticeable speedup for local inference on Macs. The release also adds NVFP4 support and smarter cache reuse, which should make coding agents and branching sessions feel much more responsive.
// ANALYSIS
This is a real infrastructure upgrade, not a cosmetic release: Ollama is making Apple Silicon feel like a first-class local inference platform again, and the cache work may matter almost as much as the raw benchmark gains for agentic workflows.
- –MLX plus Apple’s GPU Neural Accelerators should cut both time-to-first-token and steady-state generation latency on newer Macs.
- –NVFP4 support narrows the gap between local testing and production-style inference formats, which is useful for teams comparing outputs across environments.
- –Cache snapshots, reuse across conversations, and smarter eviction are exactly the kind of changes that improve Claude Code-style branching loops.
- –The preview is aimed at bigger machines with 32GB+ unified memory, so the win is strongest for high-end Apple Silicon users.
- –Focusing on Qwen3.5-35B-A3B coding workloads signals that Ollama is optimizing for serious local coding agents, not just casual chat.
// TAGS
ollamainferencegpuagentai-codingopen-source
DISCOVERED
10d ago
2026-04-01
PUBLISHED
11d ago
2026-04-01
RELEVANCE
9/ 10
AUTHOR
[REDACTED]