BACK_TO_FEEDAICRIER_2
Ollama 0.19 boosts Apple Silicon with MLX
OPEN_SOURCE ↗
PH · PRODUCT_HUNT// 10d agoINFRASTRUCTURE

Ollama 0.19 boosts Apple Silicon with MLX

Ollama 0.19 rebuilds its Apple Silicon runtime on MLX, delivering a noticeable speedup for local inference on Macs. The release also adds NVFP4 support and smarter cache reuse, which should make coding agents and branching sessions feel much more responsive.

// ANALYSIS

This is a real infrastructure upgrade, not a cosmetic release: Ollama is making Apple Silicon feel like a first-class local inference platform again, and the cache work may matter almost as much as the raw benchmark gains for agentic workflows.

  • MLX plus Apple’s GPU Neural Accelerators should cut both time-to-first-token and steady-state generation latency on newer Macs.
  • NVFP4 support narrows the gap between local testing and production-style inference formats, which is useful for teams comparing outputs across environments.
  • Cache snapshots, reuse across conversations, and smarter eviction are exactly the kind of changes that improve Claude Code-style branching loops.
  • The preview is aimed at bigger machines with 32GB+ unified memory, so the win is strongest for high-end Apple Silicon users.
  • Focusing on Qwen3.5-35B-A3B coding workloads signals that Ollama is optimizing for serious local coding agents, not just casual chat.
// TAGS
ollamainferencegpuagentai-codingopen-source

DISCOVERED

10d ago

2026-04-01

PUBLISHED

11d ago

2026-04-01

RELEVANCE

9/ 10

AUTHOR

[REDACTED]