MLX NVFP4 trails 4-bit in tests

// 114d agoINFRASTRUCTURE

MLX NVFP4 trails 4-bit in tests

An MLX user benchmarked NVFP4 and MXFP4 quants on Qwen 3.5 35B and found both worse than standard 4-bit on perplexity. The current MLX release line does expose NVFP4 support, but the thread makes it sound more like early plumbing than a clear win.

// ANALYSIS

Promising support, underwhelming payoff so far. My read is that MLX has the format wired up, but the practical gains are still muted unless the kernels and native paths mature.

–MLX v0.30.3 says it supports NVFP4 and MXFP8 quantized ops on Metal and NVFP4/MXFP8 quantized-quantized matmul on CUDA, so the framework is actively moving here: https://github.com/ml-explore/mlx/releases
–The Reddit benchmark on Qwen 3.5 35B shows NVFP4 at 7.991 word perplexity versus 7.850 for 4-bit, with MXFP4 even worse at 8.379: https://www.reddit.com/r/LocalLLaMA/comments/1ry5gm3/has_anyone_tried_nvfp4_on_mlx/
–Reported speed was roughly the same as 4-bit, so there is no obvious throughput upside yet on the Mac path.
–The thread’s guess that MLX may be dequantizing or otherwise emulating the format is plausible, but that part is still an inference rather than confirmed behavior.

// TAGS

benchmarkinferencegpuopen-sourcellmmlx

DISCOVERED

114d ago

2026-03-19

PUBLISHED

114d ago

2026-03-19

RELEVANCE

8/ 10

AUTHOR

Odd-Ordinary-5922

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL1h ago

Reve 2.1 drops native 4K rendering

Reve has released version 2.1 of its creative image generation model, introducing native 4K rendering, object-level editing, and a new "Live Layers" feature. The update enables users to perform localized edits and manage layouts directly, catering to professional design workflows requiring precise control.

RESEARCH1h ago

UCSD researchers successfully demonstrate the first in-vivo teleoperated surgical procedures using general-purpose humanoid robots.

Researchers at the University of California San Diego (UCSD) have achieved a milestone in medical robotics by using Unitree G1 general-purpose humanoid robots (nicknamed "Surgie") to perform laparoscopic gallbladder removals on live animal subjects. The study, published in Nature, evaluated a teleoperated humanoid platform that utilizes standard surgical instruments via custom-made hand adapters. In the trials, the researchers successfully demonstrated both human-robot teams (a humanoid operated by a teleoperator assisting a human surgeon) and robot-robot teams (two humanoids working cooperatively) to complete the surgical tasks. This research indicates that while humanoid platforms are currently slower and less precise than specialized systems like the da Vinci, they offer a far more compact, versatile, and cost-effective alternative that could expand surgical access to remote, rural, or emergency settings.

OPEN SOURCE1h ago

ABot-World simulates infinite 720p worlds on single GPU

ABot-World is an open-source, action-conditioned infinite world simulator designed to generate interactive 720p environments at 16 frames per second with low latency on a single desktop GPU. By utilizing an NVIDIA RTX 5090 and requiring just 19GB of GPU memory, this embodied world model offers physical compliance, action controllability, and zero-shot generalization, making real-time, interactive environment simulation accessible on consumer-grade hardware.