OPEN_SOURCE ↗
REDDIT · REDDIT// 32d agoRESEARCH PAPER
AGENTbench finds extra context hurts agents
ETH Zurich’s “Evaluating AGENTS.md” paper introduces AGENTbench, a 138-task benchmark for testing repository-level context files with coding agents, and finds that LLM-generated AGENTS.md or CLAUDE.md files slightly reduce task success while raising inference costs by more than 20%. For AI developers, the key takeaway is that agents benefit more from short, high-signal instructions about tooling than from sprawling repository summaries.
// ANALYSIS
This is a useful reality check for the “more context is always better” mindset in agent engineering: extra guidance often turns into extra work, not extra accuracy.
- –Across four coding agents, LLM-generated context files lowered success rates on average versus providing no context file at all, while adding 20–23% more cost
- –Developer-written context files did a bit better, improving AGENTbench results by about 4% on average, but they still increased steps, reasoning, and spend
- –The paper’s strongest finding is behavioral: agents do follow the instructions in context files, but that literal obedience pushes them into more testing, file search, and tool use without helping them find the right code faster
- –LLM-generated context files became helpful only when the repo’s other documentation was stripped away, suggesting these files mostly duplicate existing docs rather than add new signal
- –For teams building coding agents, the recommendation is straightforward: keep context files minimal, task-relevant, and focused on non-obvious repo requirements such as custom tooling or test workflows
// TAGS
agentbenchagentai-codingbenchmarkresearch
DISCOVERED
32d ago
2026-03-10
PUBLISHED
35d ago
2026-03-08
RELEVANCE
8/ 10
AUTHOR
EnoughNinja