OPEN_SOURCE ↗
REDDIT · REDDIT// 19d agoOPENSOURCE RELEASE
Autoresearch plugin brings Karpathy loop to Claude Code
Autoresearch turns Karpathy's one-file, one-metric experiment loop into a Claude Code plugin for real codebases. On a production Django/pgvector/Cohere search stack, 60 iterations kept only 3 changes, but the run still exposed the real bottlenecks, validated one weighting scheme, and caught a Redis cache-key bug.
// ANALYSIS
The point here isn’t a flashy score jump; it’s that the agent spent its failures buying certainty. A 93% failure rate is a feature when each revert narrows the search space and tells you where to stop tuning.
- –Ranking, not recall, was the bottleneck, so larger candidate pools and title matching were mostly dead ends.
- –The adaptive weighting survived ablation, which is exactly the kind of “we should simplify this” assumption autoresearch can settle fast.
- –Round 2 showed a classic co-optimization trap: prompt changes broke weights, and stale cache keys masked the effect until the caching bug was found.
- –This only works when the eval path is cheap, deterministic, and noisy inputs like temperature are pinned down.
- –The broader win is workflow, not optimization magic: let Claude explore the edges overnight, then spend human time on the architectural ceiling, not manual guesswork.
// TAGS
autoresearchclaude-codeagentai-codingautomationopen-sourceresearch
DISCOVERED
19d ago
2026-03-24
PUBLISHED
19d ago
2026-03-24
RELEVANCE
9/ 10
AUTHOR
hookedonwinter