OPEN_SOURCE ↗
REDDIT · REDDIT// 19d agoOPENSOURCE RELEASE
oQ debuts mixed-precision quantization for Apple Silicon
oQ is a data-driven mixed-precision quantizer for Apple Silicon that uses calibration to assign bits per layer instead of forcing one uniform width across a model. It emits standard mlx-lm-compatible models, so the same quantized weights can move across oMLX, mlx-lm, LM Studio, and other MLX-safe-tensors loaders without a custom format.
// ANALYSIS
This is the right instinct for local LLMs: treat precision as a budget to allocate, not a fixed rule to apply everywhere. If oQ keeps the artifact portable, it solves both quality and UX at once.
- –The Qwen3.5-35B-A3B table is the headline: oQ's 2-bit and 3-bit runs beat uniform mlx-lm by a wide margin on MMLU and TruthfulQA, which suggests the sensitivity heuristic is doing real work.
- –The built-in 600-sample calibration set is a practical adoption win because users don't need to assemble their own calibration corpus before trying it.
- –The interoperability story is the real moat: once the model stays MLX-standard, users can quantize once and run anywhere in the Apple Silicon stack.
- –The 4-bit HumanEval dip versus mlx-lm is a healthy caution flag; mixed precision looks promising, but it still needs broader validation across architectures and evals.
// TAGS
oqomlxopen-sourceinferenceedge-aillmmlops
DISCOVERED
19d ago
2026-03-23
PUBLISHED
19d ago
2026-03-23
RELEVANCE
8/ 10
AUTHOR
cryingneko