OPEN_SOURCE ↗
REDDIT · REDDIT// 29d agoRESEARCH PAPER
AI task horizons double every 7 months, METR finds
METR's research shows frontier AI agents have been doubling the length of tasks they can complete autonomously every 7 months since 2019, with recent acceleration to a 4-month doubling rate. At this trajectory, agents handling month-long software tasks could arrive within 2-4 years.
// ANALYSIS
This is the Moore's Law moment for AI agency — a clean exponential curve that makes vague AI hype claims concrete and measurable.
- –METR tested frontier models across ~230 tasks, finding R²=0.83 correlation between task length and agent success — unusually tight for AI benchmarks
- –Current frontier models (e.g. Claude 3.7 Sonnet) succeed on tasks taking humans a few minutes but fail 90%+ of the time on 4-hour tasks — the "why isn't AI replacing me yet" gap explained
- –7-month doubling since 2019, now accelerating to 4 months, suggests the 2026-2027 window is when multi-day autonomous task completion becomes routine
- –The self-referential implication is stark: if agents can automate AI research, the doubling rate itself could accelerate — METR explicitly flags this flywheel risk
- –Benchmark R² of 0.83 is high but not predictive of discontinuous jumps; a single architectural breakthrough could shatter the curve in either direction
// TAGS
llmagentbenchmarkresearch
DISCOVERED
29d ago
2026-03-14
PUBLISHED
33d ago
2026-03-10
RELEVANCE
8/ 10
AUTHOR
EchoOfOppenheimer