OPEN_SOURCE ↗
REDDIT · REDDIT// 29d agoRESEARCH PAPER
MXFP8 lossless for LLM quantization, MXFP4 still challenging
A systematic benchmark of post-training quantization (PTQ) across 7+ algorithms, 15 benchmarks, and 3 LLM families under Microscaling Floating-Point (MXFP) formats finds MXFP8 reliably lossless while MXFP4 remains accuracy-constrained without specialized methods. The study reveals MXFP must be treated as its own numerical regime—not a drop-in swap for integer quantization.
// ANALYSIS
MXFP's hardware backing from AMD, Intel, NVIDIA, and Microsoft makes this benchmark timely, but the message is clear: the integer PTQ playbook doesn't port cleanly to block floating-point formats.
- –MXFP8 (W8A8) is production-ready—near-zero accuracy loss across all tested model families and tasks, green-lighting deployment
- –MXFP4 is not ready as a default compression path; rotation-based methods commonly used in integer PTQ actively hurt MXFP4 accuracy due to block-wise scaling incompatibility
- –Affine transformation methods achieve 96.57% accuracy recovery at W4A4, the strongest result for the hardest compression tier
- –A simple pre-scale optimization on MXFP4 scaling factors lifts reasoning accuracy from 52.39% to 56.76%—a notable gain with minimal overhead
- –In multimodal LLMs, the language model backbone dominates quantization sensitivity; vision encoders are surprisingly robust under MXFP
// TAGS
llmbenchmarkfine-tuninginferenceresearchopen-source
DISCOVERED
29d ago
2026-03-14
PUBLISHED
31d ago
2026-03-12
RELEVANCE
7/ 10
AUTHOR
Aaaaaaaaaeeeee