OPEN_SOURCE ↗
HN · HACKER_NEWS// 19d agoRESEARCH PAPER
Karpathy's Autoresearch slashes eCLIP mean rank
Yogesh Kumar applied Karpathy's autoresearch loop to an old eCLIP research codebase, letting Claude Code iterate on `train.py` inside a locked-down containerized sandbox. In 42 runs over one Saturday, the agent cut validation mean rank by 54%, mostly by fixing a temperature clamp and retuning hyperparameters.
// ANALYSIS
This is a strong proof of concept for agentic research, but the real story is scoping: once the task is bounded by a single metric, a single file, and a hard time budget, the agent can do useful work. It looks less like an autonomous scientist and more like a very fast ablation engine that still needs a human to set the question.
- –`program.md` is the real control surface, effectively acting like a lightweight operating system for the agent.
- –The sandbox and permission lock mattered as much as the model, because they kept the loop safe and reviewable.
- –The biggest gain came from a temperature clamp bug fix, which says a lot about how much low-hanging fruit still hides in research code.
- –Hyperparameter tuning delivered more value than architectural changes, which is exactly the kind of search current agents are good at.
- –Once the exploration moved into moonshot ideas, success dropped sharply, showing the ceiling of today’s autonomous research loops.
// TAGS
autoresearchagentresearchai-codingautomationopen-source
DISCOVERED
19d ago
2026-03-23
PUBLISHED
19d ago
2026-03-23
RELEVANCE
8/ 10
AUTHOR
ykumards