BACK_TO_FEEDAICRIER_2
Autoresearch plugin brings Karpathy loop to Claude Code
OPEN_SOURCE ↗
REDDIT · REDDIT// 19d agoOPENSOURCE RELEASE

Autoresearch plugin brings Karpathy loop to Claude Code

Autoresearch turns Karpathy's one-file, one-metric experiment loop into a Claude Code plugin for real codebases. On a production Django/pgvector/Cohere search stack, 60 iterations kept only 3 changes, but the run still exposed the real bottlenecks, validated one weighting scheme, and caught a Redis cache-key bug.

// ANALYSIS

The point here isn’t a flashy score jump; it’s that the agent spent its failures buying certainty. A 93% failure rate is a feature when each revert narrows the search space and tells you where to stop tuning.

  • Ranking, not recall, was the bottleneck, so larger candidate pools and title matching were mostly dead ends.
  • The adaptive weighting survived ablation, which is exactly the kind of “we should simplify this” assumption autoresearch can settle fast.
  • Round 2 showed a classic co-optimization trap: prompt changes broke weights, and stale cache keys masked the effect until the caching bug was found.
  • This only works when the eval path is cheap, deterministic, and noisy inputs like temperature are pinned down.
  • The broader win is workflow, not optimization magic: let Claude explore the edges overnight, then spend human time on the architectural ceiling, not manual guesswork.
// TAGS
autoresearchclaude-codeagentai-codingautomationopen-sourceresearch

DISCOVERED

19d ago

2026-03-24

PUBLISHED

19d ago

2026-03-24

RELEVANCE

9/ 10

AUTHOR

hookedonwinter