M4 Max MacBook benchmarked for OpenCode, Qwen3
A developer evaluates the MacBook M4 Max's performance using local LLMs for agentic coding, sharing benchmarks for the Qwen3-30B-A3B model. The results showcase the high-throughput capabilities of the 40-core GPU when paired with modern Mixture-of-Experts architectures in a local development environment.
The M4 Max's unified memory remains the definitive "killer feature" for running 30B+ parameter models at usable speeds on consumer-grade hardware.
- –Benchmarks of ~89 tokens/sec on Qwen3-30B-A3B confirm that MoE models are the sweet spot for high-performance local coding agents.
- –OpenCode is emerging as the premier model-agnostic TUI for developers seeking a local-first alternative to proprietary agents like Claude Code.
- –While 32GB of RAM is viable for 30B models, the community increasingly recommends 64GB+ to accommodate the long-context windows required for multi-file codebases.
- –Switching to MLX-native runners provides a significant 30-50% performance boost over llama.cpp for Qwen models on Apple Silicon.
DISCOVERED
46d ago
2026-04-12
PUBLISHED
46d ago
2026-04-11
RELEVANCE
AUTHOR
AnotherDevArchSecOps