OPEN_SOURCE ↗
REDDIT · REDDIT// 23d agoBENCHMARK RESULT
Qwen3.5 397B quant hits 93% MMLU
A community MLX quantization of Qwen3.5-397B-A17B claims 93% on a 200-question MMLU run while fitting into 180GB and sustaining about 38 tokens per second on M3 Ultra hardware. The post is really a local-inference benchmark story, not a new base model release.
// ANALYSIS
This is a strong reminder that the local-model arms race is shifting from “can it run?” to “which quantization preserves quality without killing speed?”
- –The underlying official model is Qwen3.5-397B-A17B, a 397B-total, 17B-active MoE model; the community quant here is trying to squeeze frontier-class capability into practical Apple Silicon memory budgets.
- –The headline 93% figure is self-reported on a 200-question MMLU slice, so it’s interesting but not directly comparable to the official Qwen benchmark table, which reports MMLU-Pro and other standardized evals.
- –The meaningful angle for developers is the tradeoff curve: this build appears smaller than some other MLX 4-bit ports while claiming better throughput, which matters if you care about interactive local usage.
- –The author’s note about weaker coding performance lines up with the usual pattern: quantization and MoE routing can preserve reasoning scores better than they preserve messy real-world coding behavior.
- –If others replicate the speed claim, this becomes one of the more compelling “big model, local enough” options for experimentation on high-memory Macs.
// TAGS
qwen3.5llmbenchmarkopen-weightsmlxinferencereasoning
DISCOVERED
23d ago
2026-03-20
PUBLISHED
23d ago
2026-03-20
RELEVANCE
9/ 10
AUTHOR
HealthyCommunicat