OPEN_SOURCE ↗
REDDIT · REDDIT// 17d agoBENCHMARK RESULT
Qwen3.5-397B-A17B tops local LLM benchmark tests
Developer u/awl130's "AI Analytical Intelligence Test" series crowns Qwen3.5-397B-A17B as the premier local LLM for high-spec workstations. By leveraging the 512GB unified memory of the Mac Studio M3 Ultra, the model achieves frontier-level reasoning with a Mixture-of-Experts architecture that only activates 17B parameters at a time.
// ANALYSIS
Massive MoE models like Qwen 3.5 397B are redrawing the boundaries for local AI, proving that frontier-class intelligence is no longer restricted to multi-GPU data centers.
- –High efficiency: 17B active parameters deliver intelligence comparable to top-tier proprietary models while maintaining a manageable compute footprint.
- –Hardware threshold: Q8_0 quantization requires nearly 400GB of RAM, making the 512GB Mac Studio the only consumer device capable of hosting the model at high precision.
- –Optimization breakthroughs: Jangq.ai's "mixed-precision" quantization prevents the coherence failures seen in standard 2-bit quants for large MoE architectures.
- –Performance bottleneck: While the model fits in unified memory, the 800GB/s throughput of the M3 Ultra limits tokens-per-second, favoring deep reasoning over real-time chat.
- –Ecosystem growth: The success of vMLX and MLX Studio suggests a maturing software stack for high-end local LLM inference on macOS.
// TAGS
qwen3-5-397b-a17bllmopen-weightsmac-studiomoebenchmarkai-codingapple-silicon
DISCOVERED
17d ago
2026-03-26
PUBLISHED
17d ago
2026-03-26
RELEVANCE
8/ 10
AUTHOR
awl130