Claude Code builds Flash-MoE in 24 hours
Anthropic's agentic CLI tool was used to autonomously build Flash-MoE, a custom C/Metal inference engine that runs a 397B-parameter Qwen 3.5 model on a 48GB MacBook Pro at 5.5 t/s. By automating 90+ optimization experiments in a single day, Claude Code demonstrated the power of agentic engineering in solving complex, low-level systems problems that typically require weeks of human effort.
Claude Code's "autoresearch" capability is a landmark shift from AI-assisted to AI-led engineering, proving that agents can handle brute-force system optimization at scale.
- –Flash-MoE implements Apple's "LLM in a Flash" research to stream expert weights from SSD, bypassing the 200GB+ RAM requirement for massive MoE models.
- –The engine uses custom Metal kernels and FMA-optimized 4-bit dequantization to achieve usable local inference speeds on consumer hardware.
- –Claude Code autonomously discovered the optimal balance of parallel I/O and GPU scheduling, a task involving a massive search space of configuration and code changes.
- –This project highlights the transition of AI tools from simple code generators to autonomous research partners capable of validating hypotheses through empirical testing.
- –The 24-hour turnaround for a project of this technical depth sets a new benchmark for the speed of AI-driven software development.
DISCOVERED
68d ago
2026-03-22
PUBLISHED
68d ago
2026-03-22
RELEVANCE
AUTHOR
Github Awesome
