BACK_TO_FEEDAICRIER_2
Evaluating AGENTS.md cuts coding-agent win rates
OPEN_SOURCE ↗
YT · YOUTUBE// 36d agoRESEARCH PAPER

Evaluating AGENTS.md cuts coding-agent win rates

This paper benchmarks repository-level instruction files like AGENTS.md and CLAUDE.md across multiple coding agents, finding they usually raise inference cost by more than 20% while slightly hurting task success. The practical takeaway for AI-heavy dev teams is blunt: keep repo guidance minimal, specific, and focused on non-obvious constraints instead of restating what the codebase already says.

// ANALYSIS

This is a useful reality check for the cargo cult around giant repo instruction files. The paper does not say context files are useless; it says autogenerated markdown summaries often add noise, while concise human-written constraints can still help.

  • The authors introduce AGENTbench, a new benchmark built from 138 real GitHub issues across 12 repositories that already contain developer-written context files
  • LLM-generated context files lowered success rates on average and increased steps, testing, and file exploration, which translated into materially higher token spend
  • Human-written context files performed better than autogenerated ones, but the gains were modest and inconsistent across models, so quality matters more than file existence
  • The strongest evidence is behavioral: agents really do follow these files, which means bad or redundant instructions can actively drag them into extra work
  • Hacker News discussion around the paper converged on the same practical lesson: use AGENTS.md for tribal knowledge, workflow constraints, and non-obvious gotchas, not repo summaries
// TAGS
evaluating-agents-mdai-codingagentresearchbenchmarkdevtool

DISCOVERED

36d ago

2026-03-06

PUBLISHED

36d ago

2026-03-06

RELEVANCE

9/ 10

AUTHOR

Theo - t3․gg