BACK_TO_FEEDAICRIER_2
METR data revives AI R&D odds
OPEN_SOURCE ↗
REDDIT · REDDIT// 37d agoNEWS

METR data revives AI R&D odds

Ajeya Cotra says recent METR benchmark results and Claude Opus 4.6’s jump to roughly a 12-hour 50% software-task horizon made her January 2026 forecast look too conservative. She now argues a 10% chance of full AI R&D automation this year is no longer easy to dismiss, especially if long projects decompose into agent-manageable chunks.

// ANALYSIS

This is one of the sharper near-term AI progress arguments because it turns benchmark movement into a labor-market claim, not just another “models got better” post. The big question is no longer whether agents can code, but whether orchestration and decomposition let them scale from eight-hour tasks to real research org throughput faster than expected.

  • METR’s time-horizon framing is becoming a high-signal way to translate eval scores into what frontier agents can actually do for engineering teams
  • Cotra’s update is driven less by hype than by the pace of recent benchmark movement, especially the apparent jump from roughly five-hour to roughly twelve-hour tasks in a couple of months
  • The most important claim is not that one agent can do year-long work alone, but that cheap agent teams plus heavy scaffolding could chew through decomposable projects surprisingly well
  • This still falls short of proving end-to-end AI R&D automation, since Cotra explicitly flags research judgment, creativity, and high-reliability execution as unresolved bottlenecks
  • If current suites are saturating, labs will need messier 80% and 95% reliability evals fast, or they will lose visibility into whether agents are nearing real deployment thresholds
// TAGS
metrbenchmarkagentai-codingresearch

DISCOVERED

37d ago

2026-03-06

PUBLISHED

37d ago

2026-03-05

RELEVANCE

8/ 10

AUTHOR

SteppenAxolotl