YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

oQ beats mlx-lm on KL, RAM

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

oQ beats mlx-lm on KL, RAM
OPEN LINK ↗
// 45d agoBENCHMARK RESULT

oQ beats mlx-lm on KL, RAM

The post benchmarks oQ against mlx-lm’s built-in quantization on Qwen3.5-35B-A3B using KL divergence and RAM usage. oQ keeps the output distribution much closer to the original model at most bit widths, but it usually costs a bit more memory to do it.

// ANALYSIS

oQ looks like the stronger default if KL divergence is your quality yardstick; it trades a modest RAM increase for a much cleaner approximation of the source model.

  • At 2-bit and 3-bit, oQ is dramatically better than mlx-lm’s Q in KL terms, which is where quantization usually hurts most.
  • By 6-bit and 8-bit, the gap narrows, so the decision becomes more about RAM budget than fidelity.
  • The MXFP4 and MXFP8 reference points are useful, but they do not change the basic story: sensitivity-aware allocation wins on distribution preservation.
  • The result reinforces the post’s broader point that “smallest file size” is not the same as “best quantization” for LLMs.
// TAGS
oqmlx-lmbenchmarkllminference

DISCOVERED

45d ago

2026-04-24

PUBLISHED

45d ago

2026-04-24

RELEVANCE

7/ 10

AUTHOR

dpswt