autoresearch-ane brings Karpathy loop to ANE
autoresearch-ane is a new open-source fork that adapts Karpathy’s autonomous LLM-training loop to Apple’s Neural Engine using reverse-engineered private APIs instead of CUDA. The project claims a big step-up in steps per 5-minute run by switching to a dynamic weight pipeline, positioning Apple Silicon as a plausible low-power playground for agent-driven training experiments.
This is the most interesting kind of AI tinkering: not a new model, but a new way to squeeze useful research cycles out of hardware that was never meant to be open for training. If it holds up, the real story is throughput-per-watt and overnight experimentation on commodity Macs, not raw benchmark glory.
- –The fork stands on two timely trends at once: Karpathy’s `autoresearch` loop and the recent reverse engineering work that exposed ANE training paths.
- –Its biggest reported gain is architectural, not algorithmic: eliminating per-batch recompilation reportedly boosts throughput from roughly 120 to 1340 steps in the same 5-minute budget.
- –The repo is honest that this is a separate training stack with different data, metrics, and implementation details, so its numbers are not directly comparable to the original CUDA version.
- –Using private Apple APIs makes this exciting for researchers and hackers, but fragile for anyone hoping for a stable production path.
- –As a prototype, it is still early, but it points toward a broader idea: autonomous model experimentation could move down from datacenter GPUs to personal silicon.
DISCOVERED
32d ago
2026-03-11
PUBLISHED
33d ago
2026-03-10
RELEVANCE
AUTHOR
paraboloed