Qwen3.5 397B quant hits 93% MMLU

// 114d agoBENCHMARK RESULT

Qwen3.5 397B quant hits 93% MMLU

A community MLX quantization of Qwen3.5-397B-A17B claims 93% on a 200-question MMLU run while fitting into 180GB and sustaining about 38 tokens per second on M3 Ultra hardware. The post is really a local-inference benchmark story, not a new base model release.

// ANALYSIS

This is a strong reminder that the local-model arms race is shifting from “can it run?” to “which quantization preserves quality without killing speed?”

–The underlying official model is Qwen3.5-397B-A17B, a 397B-total, 17B-active MoE model; the community quant here is trying to squeeze frontier-class capability into practical Apple Silicon memory budgets.
–The headline 93% figure is self-reported on a 200-question MMLU slice, so it’s interesting but not directly comparable to the official Qwen benchmark table, which reports MMLU-Pro and other standardized evals.
–The meaningful angle for developers is the tradeoff curve: this build appears smaller than some other MLX 4-bit ports while claiming better throughput, which matters if you care about interactive local usage.
–The author’s note about weaker coding performance lines up with the usual pattern: quantization and MoE routing can preserve reasoning scores better than they preserve messy real-world coding behavior.
–If others replicate the speed claim, this becomes one of the more compelling “big model, local enough” options for experimentation on high-memory Macs.

// TAGS

qwen3.5llmbenchmarkopen-weightsmlxinferencereasoning

DISCOVERED

114d ago

2026-03-20

PUBLISHED

114d ago

2026-03-20

RELEVANCE

9/ 10

AUTHOR

HealthyCommunicat

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE37m ago

OpenDisplay turns iOS devices into Mac monitors

OpenDisplay is an open-source utility that streams macOS desktops to iPads or iPhones over USB or Wi-Fi, turning them into low-latency, high-resolution external monitors. Leveraging macOS's private CGVirtualDisplay API, ScreenCaptureKit, and VideoToolbox, it integrates directly into macOS Display settings as a true extended display without needing external servers or telemetry.

OPEN SOURCE37m ago

NASA releases SpaceWasm flight WebAssembly interpreter

spacewasm is a WebAssembly interpreter developed by NASA and Caltech for safety-critical flight software. Written in Rust, it decodes Wasm modules in a single pass into an optimized intermediate representation and utilizes a custom memory model with fixed-size allocation pages to guarantee deterministic execution and avoid memory panics in resource-constrained embedded systems.

OPEN SOURCE37m ago

Agent Skills guides agent UI design

Agent Skills is an open-source library and prompting system designed to help front-end coding agents like Cursor and Claude Code build premium user interfaces. The project provides reusable design guardrails and procedural workflows for advanced styling, GSAP animations, and WebGL.