OPEN_SOURCE ↗
REDDIT · REDDIT// 6h agoBENCHMARK RESULT
oQ beats mlx-lm on KL, RAM
The post benchmarks oQ against mlx-lm’s built-in quantization on Qwen3.5-35B-A3B using KL divergence and RAM usage. oQ keeps the output distribution much closer to the original model at most bit widths, but it usually costs a bit more memory to do it.
// ANALYSIS
oQ looks like the stronger default if KL divergence is your quality yardstick; it trades a modest RAM increase for a much cleaner approximation of the source model.
- –At 2-bit and 3-bit, oQ is dramatically better than mlx-lm’s Q in KL terms, which is where quantization usually hurts most.
- –By 6-bit and 8-bit, the gap narrows, so the decision becomes more about RAM budget than fidelity.
- –The MXFP4 and MXFP8 reference points are useful, but they do not change the basic story: sensitivity-aware allocation wins on distribution preservation.
- –The result reinforces the post’s broader point that “smallest file size” is not the same as “best quantization” for LLMs.
// TAGS
oqmlx-lmbenchmarkllminference
DISCOVERED
6h ago
2026-04-24
PUBLISHED
8h ago
2026-04-24
RELEVANCE
7/ 10
AUTHOR
dpswt