AI task horizons double every 7 months, METR finds

// 75d agoRESEARCH PAPER

AI task horizons double every 7 months, METR finds

METR's research shows frontier AI agents have been doubling the length of tasks they can complete autonomously every 7 months since 2019, with recent acceleration to a 4-month doubling rate. At this trajectory, agents handling month-long software tasks could arrive within 2-4 years.

// ANALYSIS

This is the Moore's Law moment for AI agency — a clean exponential curve that makes vague AI hype claims concrete and measurable.

–METR tested frontier models across ~230 tasks, finding R²=0.83 correlation between task length and agent success — unusually tight for AI benchmarks
–Current frontier models (e.g. Claude 3.7 Sonnet) succeed on tasks taking humans a few minutes but fail 90%+ of the time on 4-hour tasks — the "why isn't AI replacing me yet" gap explained
–7-month doubling since 2019, now accelerating to 4 months, suggests the 2026-2027 window is when multi-day autonomous task completion becomes routine
–The self-referential implication is stark: if agents can automate AI research, the doubling rate itself could accelerate — METR explicitly flags this flywheel risk
–Benchmark R² of 0.83 is high but not predictive of discontinuous jumps; a single architectural breakthrough could shatter the curve in either direction

// TAGS

llmagentbenchmarkresearch

DISCOVERED

75d ago

2026-03-14

PUBLISHED

79d ago

2026-03-10

RELEVANCE

8/ 10

AUTHOR

EchoOfOppenheimer

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

INFRA21m ago

Hippocratic AI hits 99.9% safety on NVIDIA Blackwell

Hippocratic AI achieved 99.9% clinical safety and a 2x prefill speedup using DigitalOcean’s NVIDIA Blackwell-powered AI-Native Cloud. The collaboration demonstrates the real-world performance gains of the HGX B300 for high-concurrency, safety-critical medical agents.

UPDATE26m ago

Claude Code adds automated fixes, persistent model defaults

Claude Code v2.1.153 introduces `/code-review --fix` to automatically apply suggested improvements and persists model selections as defaults. The update also ships critical security patches for OAuth credentials and resolves major memory leaks for long-running sessions.

NEWS46m ago

Midjourney founder: diffusion wins as FLOPS outpace memory

David Holz argues that diffusion models are the superior long-term architecture because they scale with cheap compute (FLOPS) while autoregressive models remain bottlenecked by expensive memory bandwidth.