BACK_TO_FEEDAICRIER_2
AGENTbench finds extra context hurts agents
OPEN_SOURCE ↗
REDDIT · REDDIT// 32d agoRESEARCH PAPER

AGENTbench finds extra context hurts agents

ETH Zurich’s “Evaluating AGENTS.md” paper introduces AGENTbench, a 138-task benchmark for testing repository-level context files with coding agents, and finds that LLM-generated AGENTS.md or CLAUDE.md files slightly reduce task success while raising inference costs by more than 20%. For AI developers, the key takeaway is that agents benefit more from short, high-signal instructions about tooling than from sprawling repository summaries.

// ANALYSIS

This is a useful reality check for the “more context is always better” mindset in agent engineering: extra guidance often turns into extra work, not extra accuracy.

  • Across four coding agents, LLM-generated context files lowered success rates on average versus providing no context file at all, while adding 20–23% more cost
  • Developer-written context files did a bit better, improving AGENTbench results by about 4% on average, but they still increased steps, reasoning, and spend
  • The paper’s strongest finding is behavioral: agents do follow the instructions in context files, but that literal obedience pushes them into more testing, file search, and tool use without helping them find the right code faster
  • LLM-generated context files became helpful only when the repo’s other documentation was stripped away, suggesting these files mostly duplicate existing docs rather than add new signal
  • For teams building coding agents, the recommendation is straightforward: keep context files minimal, task-relevant, and focused on non-obvious repo requirements such as custom tooling or test workflows
// TAGS
agentbenchagentai-codingbenchmarkresearch

DISCOVERED

32d ago

2026-03-10

PUBLISHED

35d ago

2026-03-08

RELEVANCE

8/ 10

AUTHOR

EnoughNinja