BACK_TO_FEEDAICRIER_2
Qwen3.5-397B-A17B tops local LLM benchmark tests
OPEN_SOURCE ↗
REDDIT · REDDIT// 17d agoBENCHMARK RESULT

Qwen3.5-397B-A17B tops local LLM benchmark tests

Developer u/awl130's "AI Analytical Intelligence Test" series crowns Qwen3.5-397B-A17B as the premier local LLM for high-spec workstations. By leveraging the 512GB unified memory of the Mac Studio M3 Ultra, the model achieves frontier-level reasoning with a Mixture-of-Experts architecture that only activates 17B parameters at a time.

// ANALYSIS

Massive MoE models like Qwen 3.5 397B are redrawing the boundaries for local AI, proving that frontier-class intelligence is no longer restricted to multi-GPU data centers.

  • High efficiency: 17B active parameters deliver intelligence comparable to top-tier proprietary models while maintaining a manageable compute footprint.
  • Hardware threshold: Q8_0 quantization requires nearly 400GB of RAM, making the 512GB Mac Studio the only consumer device capable of hosting the model at high precision.
  • Optimization breakthroughs: Jangq.ai's "mixed-precision" quantization prevents the coherence failures seen in standard 2-bit quants for large MoE architectures.
  • Performance bottleneck: While the model fits in unified memory, the 800GB/s throughput of the M3 Ultra limits tokens-per-second, favoring deep reasoning over real-time chat.
  • Ecosystem growth: The success of vMLX and MLX Studio suggests a maturing software stack for high-end local LLM inference on macOS.
// TAGS
qwen3-5-397b-a17bllmopen-weightsmac-studiomoebenchmarkai-codingapple-silicon

DISCOVERED

17d ago

2026-03-26

PUBLISHED

17d ago

2026-03-26

RELEVANCE

8/ 10

AUTHOR

awl130