YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

MLX NVFP4 trails 4-bit in tests

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

MLX NVFP4 trails 4-bit in tests
OPEN LINK ↗
// 69d agoINFRASTRUCTURE

MLX NVFP4 trails 4-bit in tests

An MLX user benchmarked NVFP4 and MXFP4 quants on Qwen 3.5 35B and found both worse than standard 4-bit on perplexity. The current MLX release line does expose NVFP4 support, but the thread makes it sound more like early plumbing than a clear win.

// ANALYSIS

Promising support, underwhelming payoff so far. My read is that MLX has the format wired up, but the practical gains are still muted unless the kernels and native paths mature.

  • MLX v0.30.3 says it supports NVFP4 and MXFP8 quantized ops on Metal and NVFP4/MXFP8 quantized-quantized matmul on CUDA, so the framework is actively moving here: https://github.com/ml-explore/mlx/releases
  • The Reddit benchmark on Qwen 3.5 35B shows NVFP4 at 7.991 word perplexity versus 7.850 for 4-bit, with MXFP4 even worse at 8.379: https://www.reddit.com/r/LocalLLaMA/comments/1ry5gm3/has_anyone_tried_nvfp4_on_mlx/
  • Reported speed was roughly the same as 4-bit, so there is no obvious throughput upside yet on the Mac path.
  • The thread’s guess that MLX may be dequantizing or otherwise emulating the format is plausible, but that part is still an inference rather than confirmed behavior.
// TAGS
benchmarkinferencegpuopen-sourcellmmlx

DISCOVERED

69d ago

2026-03-19

PUBLISHED

69d ago

2026-03-19

RELEVANCE

8/ 10

AUTHOR

Odd-Ordinary-5922