Evaluating AGENTS.md cuts coding-agent win rates
This paper benchmarks repository-level instruction files like AGENTS.md and CLAUDE.md across multiple coding agents, finding they usually raise inference cost by more than 20% while slightly hurting task success. The practical takeaway for AI-heavy dev teams is blunt: keep repo guidance minimal, specific, and focused on non-obvious constraints instead of restating what the codebase already says.
This is a useful reality check for the cargo cult around giant repo instruction files. The paper does not say context files are useless; it says autogenerated markdown summaries often add noise, while concise human-written constraints can still help.
- –The authors introduce AGENTbench, a new benchmark built from 138 real GitHub issues across 12 repositories that already contain developer-written context files
- –LLM-generated context files lowered success rates on average and increased steps, testing, and file exploration, which translated into materially higher token spend
- –Human-written context files performed better than autogenerated ones, but the gains were modest and inconsistent across models, so quality matters more than file existence
- –The strongest evidence is behavioral: agents really do follow these files, which means bad or redundant instructions can actively drag them into extra work
- –Hacker News discussion around the paper converged on the same practical lesson: use AGENTS.md for tribal knowledge, workflow constraints, and non-obvious gotchas, not repo summaries
DISCOVERED
82d ago
2026-03-06
PUBLISHED
82d ago
2026-03-06
RELEVANCE
AUTHOR
Theo - t3․gg