YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

AutoResearch scores 14% gain on transit corpus

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

AutoResearch scores 14% gain on transit corpus
OPEN LINK ↗
// 49d agoBENCHMARK RESULT

AutoResearch scores 14% gain on transit corpus

A user applied Karpathy’s autoresearch loop to a 33M-token public transit corpus and reported a roughly 14% language-modeling improvement on an 80M-parameter transformer trained from scratch. The post’s main value is methodological: it also shows several apparent accuracy wins failed to replicate, which makes the validation setup more interesting than the raw score.

// ANALYSIS

This reads as a solid small-data methodology report, not a frontier model result. The strongest signal is that autoresearch can still discover useful training changes under tight wall-clock constraints, but only if the evaluation gate is stricter than the metric the agent can see. Halving batch size was the key gain because it traded batch stability for 3.6x more optimizer steps inside the same 5-minute budget. The 80M-parameter model appears to be the best fit for this hardware and time budget; larger models ran out of steps, smaller ones ran out of capacity. The hidden validation gate did real work: two dev-bpb improvements looked good to the agent but failed to generalize to the held-out surface. The replication pass matters more than the headline gain; most domain-accuracy deltas collapsed across seeds, which is exactly what you'd expect from 100-250 item eval sets. The most useful next step is a DAPT comparison, because it separates "random init plus search" from the much easier pretrained baseline.

// TAGS
autoresearchllmagentresearchbenchmarkopen-source

DISCOVERED

49d ago

2026-04-30

PUBLISHED

49d ago

2026-04-30

RELEVANCE

8/ 10

AUTHOR

MarsPassenger