Autolab ships confidence-first autoresearch CLI stack
Autolab introduces three open-source CLI tools for Karpathy-style autoresearch: autojudge scores keep/discard confidence against noise, autosteer suggests next experiment directions, and autoevolve runs competing multi-agent worktrees. The project is live on GitHub and PyPI (autojudge 1.0.1, autosteer 1.0.1, autoevolve 1.1.1 released March 16, 2026) and is aimed at reducing false-positive keeps that waste downstream GPU cycles.
The sharp insight here is that bad keeps are costlier than clean discards, and this toolkit operationalizes that idea into a reproducible loop.
- –`autojudge` reframes tiny metric deltas as statistical confidence decisions, which is exactly what noisy overnight autoresearch runs usually lack.
- –`autosteer` adds lightweight portfolio logic (explore vs exploit) that can reduce random-walk experimentation without claiming causal certainty.
- –`autoevolve` is the highest-upside piece for teams, but it also introduces the most orchestration complexity around branches, compute, and merge hygiene.
- –The author’s own caveats matter: confidence estimates need enough recent runs to stabilize, and early-stage tools can overfit to local experiment dynamics.
DISCOVERED
84d ago
2026-03-17
PUBLISHED
84d ago
2026-03-17
RELEVANCE
AUTHOR
dean0x