REDDIT · REDDIT// 4h agoBENCHMARK RESULT

mlx-kld benchmarks oQ, Q, MXFP, UD quants

mlx-kld is a KL-divergence benchmark for comparing MLX quantization schemes against a bf16 reference on real model outputs. The linked results suggest quantization quality is highly format- and architecture-dependent, which makes KLD a more useful lens than raw bit-width alone.

// ANALYSIS

Hot take: this is the right way to evaluate MLX quantization. Once you look at divergence from the reference distribution instead of just memory savings, it becomes obvious that “4-bit vs 6-bit” is too crude a shortcut.

–KLD is a cleaner signal than perplexity for quantization damage because it isolates the effect of the quantizer itself.
–The useful takeaway is not a single winner, but that oQ, native Q, MXFP, and UD can trade places depending on the model and architecture.
–For dense models, higher-bit or better-targeted 6-bit schemes look like the safest default; for MoE models, router-sensitive tensors can dominate the outcome.
–The benchmark reinforces that MLX users should treat quantization as a per-model decision, not a one-size-fits-all setting.
–The project is also practical: caching reference log-probs makes this kind of comparison feasible on Apple Silicon instead of purely theoretical.

// TAGS

mlx-kldbenchmarkllminferenceopen-source

DISCOVERED

4h ago

2026-04-30

PUBLISHED

7h ago

2026-04-29

RELEVANCE

8/ 10

AUTHOR

dpswt