Qwen3.5 sparse MoE hits local MacBook Pro
A developer's hands-on report highlights the high-performance but flawed agentic reasoning of Qwen3.5-35B-A3B running locally on a MacBook Pro M3 Max. While the sparse Mixture-of-Experts architecture delivers impressive throughput, the model's struggle with repetitive loops and context window limitations underscores the remaining gap in local autonomous coding tasks.
Qwen3.5-35B-A3B delivers high performance on Mac by activating only 3B parameters per token to achieve interactive speeds. The oMLX framework's tiered SSD caching represents a critical shift for local developers managing massive context prefixes. However, repetitive summarize-step-compact loops reveal persistent weaknesses in agentic reasoning for multi-file projects. Hardware remains the primary bottleneck, with 36GB RAM serving as the absolute minimum for running 35B+ models with meaningful context.
DISCOVERED
7d ago
2026-04-04
PUBLISHED
7d ago
2026-04-04
RELEVANCE
AUTHOR
Sea-Emu2600