YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Evaluating AGENTS.md cuts coding-agent win rates

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Evaluating AGENTS.md cuts coding-agent win rates
OPEN LINK ↗
// 82d agoRESEARCH PAPER

Evaluating AGENTS.md cuts coding-agent win rates

This paper benchmarks repository-level instruction files like AGENTS.md and CLAUDE.md across multiple coding agents, finding they usually raise inference cost by more than 20% while slightly hurting task success. The practical takeaway for AI-heavy dev teams is blunt: keep repo guidance minimal, specific, and focused on non-obvious constraints instead of restating what the codebase already says.

// ANALYSIS

This is a useful reality check for the cargo cult around giant repo instruction files. The paper does not say context files are useless; it says autogenerated markdown summaries often add noise, while concise human-written constraints can still help.

  • The authors introduce AGENTbench, a new benchmark built from 138 real GitHub issues across 12 repositories that already contain developer-written context files
  • LLM-generated context files lowered success rates on average and increased steps, testing, and file exploration, which translated into materially higher token spend
  • Human-written context files performed better than autogenerated ones, but the gains were modest and inconsistent across models, so quality matters more than file existence
  • The strongest evidence is behavioral: agents really do follow these files, which means bad or redundant instructions can actively drag them into extra work
  • Hacker News discussion around the paper converged on the same practical lesson: use AGENTS.md for tribal knowledge, workflow constraints, and non-obvious gotchas, not repo summaries
// TAGS
evaluating-agents-mdai-codingagentresearchbenchmarkdevtool

DISCOVERED

82d ago

2026-03-06

PUBLISHED

82d ago

2026-03-06

RELEVANCE

9/ 10

AUTHOR

Theo - t3․gg