oQ beats mlx-lm on KL, RAM

// 45d agoBENCHMARK RESULT

oQ beats mlx-lm on KL, RAM

The post benchmarks oQ against mlx-lm’s built-in quantization on Qwen3.5-35B-A3B using KL divergence and RAM usage. oQ keeps the output distribution much closer to the original model at most bit widths, but it usually costs a bit more memory to do it.

// ANALYSIS

oQ looks like the stronger default if KL divergence is your quality yardstick; it trades a modest RAM increase for a much cleaner approximation of the source model.

–At 2-bit and 3-bit, oQ is dramatically better than mlx-lm’s Q in KL terms, which is where quantization usually hurts most.
–By 6-bit and 8-bit, the gap narrows, so the decision becomes more about RAM budget than fidelity.
–The MXFP4 and MXFP8 reference points are useful, but they do not change the basic story: sensitivity-aware allocation wins on distribution preservation.
–The result reinforces the post’s broader point that “smallest file size” is not the same as “best quantization” for LLMs.

// TAGS

oqmlx-lmbenchmarkllminference

DISCOVERED

45d ago

2026-04-24

PUBLISHED

45d ago

2026-04-24

RELEVANCE

7/ 10

AUTHOR

dpswt

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE17m ago

Cloudflare Wrangler CLI guides agent setup

Cloudflare has integrated an AI agent onboarding step into the Wrangler CLI login flow, guiding developers to set up Cloudflare Skills and Model Context Protocol (MCP) servers. Once registered, these MCP servers enable coding agents to manage API bindings, track builds, and access documentation across major IDE platforms.

LAUNCH20m ago

Moonshot AI launches Kimi Work desktop agent

Moonshot AI has introduced Kimi Work, a desktop AI agent workspace powered by the Kimi K2.6 model that orchestrates up to 300 parallel agent swarms for complex productivity tasks. Operating locally, the application integrates with user files and features WebBridge technology for browser automation and local script scheduling.

UPDATE25m ago

GPT-5-nano, DeepSeek top AINFT leaderboard

AINFT has introduced a real-world usage leaderboard that tracks the popularity of various AI models across its decentralized platform based on direct user interactions rather than standard synthetic benchmarks. The initial data shows a strong preference for highly efficient and cost-effective models, with OpenAI's GPT-5-nano holding the top rank and DeepSeek securing three of the top five positions (V3.2, V4-Flash, and V4-Pro).

oQ beats mlx-lm on KL, RAM