OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoBENCHMARK RESULT
mlx-kld benchmarks oQ, Q, MXFP, UD quants
mlx-kld is a KL-divergence benchmark for comparing MLX quantization schemes against a bf16 reference on real model outputs. The linked results suggest quantization quality is highly format- and architecture-dependent, which makes KLD a more useful lens than raw bit-width alone.
// ANALYSIS
Hot take: this is the right way to evaluate MLX quantization. Once you look at divergence from the reference distribution instead of just memory savings, it becomes obvious that “4-bit vs 6-bit” is too crude a shortcut.
- –KLD is a cleaner signal than perplexity for quantization damage because it isolates the effect of the quantizer itself.
- –The useful takeaway is not a single winner, but that oQ, native Q, MXFP, and UD can trade places depending on the model and architecture.
- –For dense models, higher-bit or better-targeted 6-bit schemes look like the safest default; for MoE models, router-sensitive tensors can dominate the outcome.
- –The benchmark reinforces that MLX users should treat quantization as a per-model decision, not a one-size-fits-all setting.
- –The project is also practical: caching reference log-probs makes this kind of comparison feasible on Apple Silicon instead of purely theoretical.
// TAGS
mlx-kldbenchmarkllminferenceopen-source
DISCOVERED
4h ago
2026-04-30
PUBLISHED
7h ago
2026-04-29
RELEVANCE
8/ 10
AUTHOR
dpswt