BACK_TO_FEEDAICRIER_2
Claude Code builds Flash-MoE in 24 hours
OPEN_SOURCE ↗
YT · YOUTUBE// 21d agoNEWS

Claude Code builds Flash-MoE in 24 hours

Anthropic's agentic CLI tool was used to autonomously build Flash-MoE, a custom C/Metal inference engine that runs a 397B-parameter Qwen 3.5 model on a 48GB MacBook Pro at 5.5 t/s. By automating 90+ optimization experiments in a single day, Claude Code demonstrated the power of agentic engineering in solving complex, low-level systems problems that typically require weeks of human effort.

// ANALYSIS

Claude Code's "autoresearch" capability is a landmark shift from AI-assisted to AI-led engineering, proving that agents can handle brute-force system optimization at scale.

  • Flash-MoE implements Apple's "LLM in a Flash" research to stream expert weights from SSD, bypassing the 200GB+ RAM requirement for massive MoE models.
  • The engine uses custom Metal kernels and FMA-optimized 4-bit dequantization to achieve usable local inference speeds on consumer hardware.
  • Claude Code autonomously discovered the optimal balance of parallel I/O and GPU scheduling, a task involving a massive search space of configuration and code changes.
  • This project highlights the transition of AI tools from simple code generators to autonomous research partners capable of validating hypotheses through empirical testing.
  • The 24-hour turnaround for a project of this technical depth sets a new benchmark for the speed of AI-driven software development.
// TAGS
claude-codeai-codingagentllminferencegpuedge-aiopen-source

DISCOVERED

21d ago

2026-03-22

PUBLISHED

21d ago

2026-03-22

RELEVANCE

9/ 10

AUTHOR

Github Awesome