MLX hits NVFP4 for 4-bit Mac inference

// 94d agoOPENSOURCE RELEASE

MLX hits NVFP4 for 4-bit Mac inference

Apple's MLX framework now supports NVIDIA's 4-bit floating point format (NVFP4), bringing Blackwell-level quantization performance and accuracy to Apple Silicon via optimized Metal kernels and M5 hardware acceleration. This update enables high-performance local LLM inference with minimal precision loss compared to traditional 4-bit methods.

// ANALYSIS

MLX's NVFP4 implementation is a game-changer for local AI, closing the accuracy gap between 16-bit and 4-bit models.

–Support for dual scaling factors (micro-block and tensor level) provides significantly better precision than traditional 4-bit integer quantization.
–M5 series chips feature native hardware acceleration for NVFP4, while M1-M4 devices see up to 7x speedups through MLX's optimized Metal kernels.
–The format enables running 35B parameter models at over 70 TPS on M3 Max, making large, capable models fast enough for interactive use.
–Ollama's default to MLX for Apple Silicon means most local users get these gains automatically as part of the v0.19 update.
–While it won't beat FP16 for raw accuracy, it dramatically lowers the "quantization tax" that has historically plagued 4-bit models.

// TAGS

mlxllminferenceopen-sourceapple-silicon

DISCOVERED

94d ago

2026-04-08

PUBLISHED

94d ago

2026-04-08

RELEVANCE

8/ 10

AUTHOR

Sea-Emu2600

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS57m ago

OpenAI, xAI, Meta drop major models

The AI model landscape saw unprecedented rapid shifts over a 96-hour period. OpenAI released the GPT-5.6 family to general availability, xAI took Grok 4.5 public following the SpaceX merger, and Meta introduced a new paid Model API, marking significant paradigm shifts across major AI players.

INFRA1h ago

Ritual builds infrastructure for autonomous AI agents

Ritual is an AI lab and infrastructure project that aims to move beyond simply making AI models smarter by focusing on granting them autonomous agency. The project is developing the underlying stack—including cryptography, consensus, and privacy mechanisms—required for AI agents to operate persistently, hold and spend their own money, and execute tasks without needing manual human approval for every action.

OPEN SOURCE1h ago

Agent Skills guides agent UI design

Agent Skills is an open-source library and prompting system designed to help front-end coding agents like Cursor and Claude Code build premium user interfaces. The project provides reusable design guardrails and procedural workflows for advanced styling, GSAP animations, and WebGL.