BACK_TO_FEEDAICRIER_2
Qwen3.5 397B quant hits 93% MMLU
OPEN_SOURCE ↗
REDDIT · REDDIT// 23d agoBENCHMARK RESULT

Qwen3.5 397B quant hits 93% MMLU

A community MLX quantization of Qwen3.5-397B-A17B claims 93% on a 200-question MMLU run while fitting into 180GB and sustaining about 38 tokens per second on M3 Ultra hardware. The post is really a local-inference benchmark story, not a new base model release.

// ANALYSIS

This is a strong reminder that the local-model arms race is shifting from “can it run?” to “which quantization preserves quality without killing speed?”

  • The underlying official model is Qwen3.5-397B-A17B, a 397B-total, 17B-active MoE model; the community quant here is trying to squeeze frontier-class capability into practical Apple Silicon memory budgets.
  • The headline 93% figure is self-reported on a 200-question MMLU slice, so it’s interesting but not directly comparable to the official Qwen benchmark table, which reports MMLU-Pro and other standardized evals.
  • The meaningful angle for developers is the tradeoff curve: this build appears smaller than some other MLX 4-bit ports while claiming better throughput, which matters if you care about interactive local usage.
  • The author’s note about weaker coding performance lines up with the usual pattern: quantization and MoE routing can preserve reasoning scores better than they preserve messy real-world coding behavior.
  • If others replicate the speed claim, this becomes one of the more compelling “big model, local enough” options for experimentation on high-memory Macs.
// TAGS
qwen3.5llmbenchmarkopen-weightsmlxinferencereasoning

DISCOVERED

23d ago

2026-03-20

PUBLISHED

23d ago

2026-03-20

RELEVANCE

9/ 10

AUTHOR

HealthyCommunicat