BACK_TO_FEEDAICRIER_2
oQ beats mlx-lm on KL, RAM
OPEN_SOURCE ↗
REDDIT · REDDIT// 6h agoBENCHMARK RESULT

oQ beats mlx-lm on KL, RAM

The post benchmarks oQ against mlx-lm’s built-in quantization on Qwen3.5-35B-A3B using KL divergence and RAM usage. oQ keeps the output distribution much closer to the original model at most bit widths, but it usually costs a bit more memory to do it.

// ANALYSIS

oQ looks like the stronger default if KL divergence is your quality yardstick; it trades a modest RAM increase for a much cleaner approximation of the source model.

  • At 2-bit and 3-bit, oQ is dramatically better than mlx-lm’s Q in KL terms, which is where quantization usually hurts most.
  • By 6-bit and 8-bit, the gap narrows, so the decision becomes more about RAM budget than fidelity.
  • The MXFP4 and MXFP8 reference points are useful, but they do not change the basic story: sensitivity-aware allocation wins on distribution preservation.
  • The result reinforces the post’s broader point that “smallest file size” is not the same as “best quantization” for LLMs.
// TAGS
oqmlx-lmbenchmarkllminference

DISCOVERED

6h ago

2026-04-24

PUBLISHED

8h ago

2026-04-24

RELEVANCE

7/ 10

AUTHOR

dpswt