BACK_TO_FEEDAICRIER_2
MXFP8 lossless for LLM quantization, MXFP4 still challenging
OPEN_SOURCE ↗
REDDIT · REDDIT// 29d agoRESEARCH PAPER

MXFP8 lossless for LLM quantization, MXFP4 still challenging

A systematic benchmark of post-training quantization (PTQ) across 7+ algorithms, 15 benchmarks, and 3 LLM families under Microscaling Floating-Point (MXFP) formats finds MXFP8 reliably lossless while MXFP4 remains accuracy-constrained without specialized methods. The study reveals MXFP must be treated as its own numerical regime—not a drop-in swap for integer quantization.

// ANALYSIS

MXFP's hardware backing from AMD, Intel, NVIDIA, and Microsoft makes this benchmark timely, but the message is clear: the integer PTQ playbook doesn't port cleanly to block floating-point formats.

  • MXFP8 (W8A8) is production-ready—near-zero accuracy loss across all tested model families and tasks, green-lighting deployment
  • MXFP4 is not ready as a default compression path; rotation-based methods commonly used in integer PTQ actively hurt MXFP4 accuracy due to block-wise scaling incompatibility
  • Affine transformation methods achieve 96.57% accuracy recovery at W4A4, the strongest result for the hardest compression tier
  • A simple pre-scale optimization on MXFP4 scaling factors lifts reasoning accuracy from 52.39% to 56.76%—a notable gain with minimal overhead
  • In multimodal LLMs, the language model backbone dominates quantization sensitivity; vision encoders are surprisingly robust under MXFP
// TAGS
llmbenchmarkfine-tuninginferenceresearchopen-source

DISCOVERED

29d ago

2026-03-14

PUBLISHED

31d ago

2026-03-12

RELEVANCE

7/ 10

AUTHOR

Aaaaaaaaaeeeee