YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Ollama 0.19 boosts Apple Silicon with MLX

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Ollama 0.19 boosts Apple Silicon with MLX
OPEN LINK ↗
// 57d agoINFRASTRUCTURE

Ollama 0.19 boosts Apple Silicon with MLX

Ollama 0.19 rebuilds its Apple Silicon runtime on MLX, delivering a noticeable speedup for local inference on Macs. The release also adds NVFP4 support and smarter cache reuse, which should make coding agents and branching sessions feel much more responsive.

// ANALYSIS

This is a real infrastructure upgrade, not a cosmetic release: Ollama is making Apple Silicon feel like a first-class local inference platform again, and the cache work may matter almost as much as the raw benchmark gains for agentic workflows.

  • MLX plus Apple’s GPU Neural Accelerators should cut both time-to-first-token and steady-state generation latency on newer Macs.
  • NVFP4 support narrows the gap between local testing and production-style inference formats, which is useful for teams comparing outputs across environments.
  • Cache snapshots, reuse across conversations, and smarter eviction are exactly the kind of changes that improve Claude Code-style branching loops.
  • The preview is aimed at bigger machines with 32GB+ unified memory, so the win is strongest for high-end Apple Silicon users.
  • Focusing on Qwen3.5-35B-A3B coding workloads signals that Ollama is optimizing for serious local coding agents, not just casual chat.
// TAGS
ollamainferencegpuagentai-codingopen-source

DISCOVERED

57d ago

2026-04-01

PUBLISHED

57d ago

2026-04-01

RELEVANCE

9/ 10

AUTHOR

[REDACTED]