METR finds AI slows veteran developers

// 83d agoRESEARCH PAPER

METR finds AI slows veteran developers

METR’s randomized controlled trial tracked 16 experienced open-source maintainers across 246 real tasks and found that early-2025 AI coding tools made them 19% slower, not faster. The result is striking because developers expected a 24% speedup and mostly used frontier tools at the time, especially Cursor Pro with Claude 3.5 and 3.7 Sonnet.

// ANALYSIS

This is one of the clearest reality checks yet on AI coding hype: benchmark wins and subjective “it feels faster” reports do not automatically translate into productivity gains inside mature codebases. The bigger story is not just slowdown, but how confidently experienced developers misread their own throughput while using these tools.

–The study focuses on a hard, high-signal setting: veteran contributors working in repositories they know deeply, where AI has less room to outperform hard-won human context.
–It directly challenges the industry habit of treating anecdotal speedups and benchmark scores as interchangeable with real-world engineering output.
–The paper is careful not to overclaim; it does not say AI is broadly useless, only that in this specific early-2025 setup it slowed senior open-source developers down.
–Public discussion around the paper has zeroed in on learning-curve effects, since many participants had limited prior Cursor experience, which makes this result damning for current workflows but not necessarily final for future ones.
–For AI tool builders, the implication is brutal: reducing prompt overhead, review burden, and context friction matters more than flashy benchmark demos if the goal is real developer acceleration.

// TAGS

researchbenchmarkai-codingdevtoolmeasuring-the-impact-of-early-2025-ai-on-experienced-open-source-developer-productivity

DISCOVERED

83d ago

2026-03-06

PUBLISHED

83d ago

2026-03-06

RELEVANCE

9/ 10

AUTHOR

DIY Smart Code

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

INFRA24m ago

Hippocratic AI hits 99.9% safety on NVIDIA Blackwell

Hippocratic AI achieved 99.9% clinical safety and a 2x prefill speedup using DigitalOcean’s NVIDIA Blackwell-powered AI-Native Cloud. The collaboration demonstrates the real-world performance gains of the HGX B300 for high-concurrency, safety-critical medical agents.

UPDATE28m ago

Claude Code adds automated fixes, persistent model defaults

Claude Code v2.1.153 introduces `/code-review --fix` to automatically apply suggested improvements and persists model selections as defaults. The update also ships critical security patches for OAuth credentials and resolves major memory leaks for long-running sessions.

NEWS48m ago

Midjourney founder: diffusion wins as FLOPS outpace memory

David Holz argues that diffusion models are the superior long-term architecture because they scale with cheap compute (FLOPS) while autoregressive models remain bottlenecked by expensive memory bandwidth.