BACK_TO_FEEDAICRIER_2
Karpathy's Autoresearch slashes eCLIP mean rank
OPEN_SOURCE ↗
HN · HACKER_NEWS// 19d agoRESEARCH PAPER

Karpathy's Autoresearch slashes eCLIP mean rank

Yogesh Kumar applied Karpathy's autoresearch loop to an old eCLIP research codebase, letting Claude Code iterate on `train.py` inside a locked-down containerized sandbox. In 42 runs over one Saturday, the agent cut validation mean rank by 54%, mostly by fixing a temperature clamp and retuning hyperparameters.

// ANALYSIS

This is a strong proof of concept for agentic research, but the real story is scoping: once the task is bounded by a single metric, a single file, and a hard time budget, the agent can do useful work. It looks less like an autonomous scientist and more like a very fast ablation engine that still needs a human to set the question.

  • `program.md` is the real control surface, effectively acting like a lightweight operating system for the agent.
  • The sandbox and permission lock mattered as much as the model, because they kept the loop safe and reviewable.
  • The biggest gain came from a temperature clamp bug fix, which says a lot about how much low-hanging fruit still hides in research code.
  • Hyperparameter tuning delivered more value than architectural changes, which is exactly the kind of search current agents are good at.
  • Once the exploration moved into moonshot ideas, success dropped sharply, showing the ceiling of today’s autonomous research loops.
// TAGS
autoresearchagentresearchai-codingautomationopen-source

DISCOVERED

19d ago

2026-03-23

PUBLISHED

19d ago

2026-03-23

RELEVANCE

8/ 10

AUTHOR

ykumards