Autoresearch plugin brings Karpathy loop to Claude Code

// 111d agoOPENSOURCE RELEASE

Autoresearch plugin brings Karpathy loop to Claude Code

Autoresearch turns Karpathy's one-file, one-metric experiment loop into a Claude Code plugin for real codebases. On a production Django/pgvector/Cohere search stack, 60 iterations kept only 3 changes, but the run still exposed the real bottlenecks, validated one weighting scheme, and caught a Redis cache-key bug.

// ANALYSIS

The point here isn’t a flashy score jump; it’s that the agent spent its failures buying certainty. A 93% failure rate is a feature when each revert narrows the search space and tells you where to stop tuning.

–Ranking, not recall, was the bottleneck, so larger candidate pools and title matching were mostly dead ends.
–The adaptive weighting survived ablation, which is exactly the kind of “we should simplify this” assumption autoresearch can settle fast.
–Round 2 showed a classic co-optimization trap: prompt changes broke weights, and stale cache keys masked the effect until the caching bug was found.
–This only works when the eval path is cheap, deterministic, and noisy inputs like temperature are pinned down.
–The broader win is workflow, not optimization magic: let Claude explore the edges overnight, then spend human time on the architectural ceiling, not manual guesswork.

// TAGS

autoresearchclaude-codeagentai-codingautomationopen-sourceresearch

DISCOVERED

111d ago

2026-03-24

PUBLISHED

111d ago

2026-03-24

RELEVANCE

9/ 10

AUTHOR

hookedonwinter

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

BENCHMARK1h ago

Muse Spark 1.1 matches GPT-5.6 Sol on HealthBench Pro

A recent announcement on X reveals that Meta AI's Muse Spark 1.1 model achieves similar, if not slightly better, performance on the HealthBench Pro benchmark compared to GPT-5.6 Sol. The key advantage highlighted is the model's ability to deliver advanced health superintelligence at a significantly lower cost, making high-performance medical AI more accessible.

OPEN SOURCE1h ago

Prismor ships AI agent security control plane

Prismor is an open-source security and compliance control plane designed to protect and manage autonomous AI agents. Operating as an immune layer at the tool execution boundary, it enforces guardrails, redacts secrets, blocks destructive actions, and tracks verifiable agent identities to prevent unauthorized command execution.

OPEN SOURCE1h ago

Claude Managed Agents Cookbook launches

The Claude Managed Agents Cookbook is an open-source repository offering Jupyter notebooks to help developers learn and run stateful AI agents using the ant CLI. Running cell-by-cell, each topic is shipped in two formats to clarify the underlying mechanics of managed agents with minimal setup.