YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

MLX hits NVFP4 for 4-bit Mac inference

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

MLX hits NVFP4 for 4-bit Mac inference
OPEN LINK ↗
// 49d agoOPENSOURCE RELEASE

MLX hits NVFP4 for 4-bit Mac inference

Apple's MLX framework now supports NVIDIA's 4-bit floating point format (NVFP4), bringing Blackwell-level quantization performance and accuracy to Apple Silicon via optimized Metal kernels and M5 hardware acceleration. This update enables high-performance local LLM inference with minimal precision loss compared to traditional 4-bit methods.

// ANALYSIS

MLX's NVFP4 implementation is a game-changer for local AI, closing the accuracy gap between 16-bit and 4-bit models.

  • Support for dual scaling factors (micro-block and tensor level) provides significantly better precision than traditional 4-bit integer quantization.
  • M5 series chips feature native hardware acceleration for NVFP4, while M1-M4 devices see up to 7x speedups through MLX's optimized Metal kernels.
  • The format enables running 35B parameter models at over 70 TPS on M3 Max, making large, capable models fast enough for interactive use.
  • Ollama's default to MLX for Apple Silicon means most local users get these gains automatically as part of the v0.19 update.
  • While it won't beat FP16 for raw accuracy, it dramatically lowers the "quantization tax" that has historically plagued 4-bit models.
// TAGS
mlxllminferenceopen-sourceapple-silicon

DISCOVERED

49d ago

2026-04-08

PUBLISHED

49d ago

2026-04-08

RELEVANCE

8/ 10

AUTHOR

Sea-Emu2600