OPEN_SOURCE ↗
REDDIT · REDDIT// 3d agoOPENSOURCE RELEASE
MLX hits NVFP4 for 4-bit Mac inference
Apple's MLX framework now supports NVIDIA's 4-bit floating point format (NVFP4), bringing Blackwell-level quantization performance and accuracy to Apple Silicon via optimized Metal kernels and M5 hardware acceleration. This update enables high-performance local LLM inference with minimal precision loss compared to traditional 4-bit methods.
// ANALYSIS
MLX's NVFP4 implementation is a game-changer for local AI, closing the accuracy gap between 16-bit and 4-bit models.
- –Support for dual scaling factors (micro-block and tensor level) provides significantly better precision than traditional 4-bit integer quantization.
- –M5 series chips feature native hardware acceleration for NVFP4, while M1-M4 devices see up to 7x speedups through MLX's optimized Metal kernels.
- –The format enables running 35B parameter models at over 70 TPS on M3 Max, making large, capable models fast enough for interactive use.
- –Ollama's default to MLX for Apple Silicon means most local users get these gains automatically as part of the v0.19 update.
- –While it won't beat FP16 for raw accuracy, it dramatically lowers the "quantization tax" that has historically plagued 4-bit models.
// TAGS
mlxllminferenceopen-sourceapple-silicon
DISCOVERED
3d ago
2026-04-08
PUBLISHED
3d ago
2026-04-08
RELEVANCE
8/ 10
AUTHOR
Sea-Emu2600