OPEN_SOURCE ↗
YT · YOUTUBE// 21d agoNEWS
Claude Code builds Flash-MoE in 24 hours
Anthropic's agentic CLI tool was used to autonomously build Flash-MoE, a custom C/Metal inference engine that runs a 397B-parameter Qwen 3.5 model on a 48GB MacBook Pro at 5.5 t/s. By automating 90+ optimization experiments in a single day, Claude Code demonstrated the power of agentic engineering in solving complex, low-level systems problems that typically require weeks of human effort.
// ANALYSIS
Claude Code's "autoresearch" capability is a landmark shift from AI-assisted to AI-led engineering, proving that agents can handle brute-force system optimization at scale.
- –Flash-MoE implements Apple's "LLM in a Flash" research to stream expert weights from SSD, bypassing the 200GB+ RAM requirement for massive MoE models.
- –The engine uses custom Metal kernels and FMA-optimized 4-bit dequantization to achieve usable local inference speeds on consumer hardware.
- –Claude Code autonomously discovered the optimal balance of parallel I/O and GPU scheduling, a task involving a massive search space of configuration and code changes.
- –This project highlights the transition of AI tools from simple code generators to autonomous research partners capable of validating hypotheses through empirical testing.
- –The 24-hour turnaround for a project of this technical depth sets a new benchmark for the speed of AI-driven software development.
// TAGS
claude-codeai-codingagentllminferencegpuedge-aiopen-source
DISCOVERED
21d ago
2026-03-22
PUBLISHED
21d ago
2026-03-22
RELEVANCE
9/ 10
AUTHOR
Github Awesome