AGENTbench finds extra context hurts agents

// 124d agoRESEARCH PAPER

AGENTbench finds extra context hurts agents

ETH Zurich’s “Evaluating AGENTS.md” paper introduces AGENTbench, a 138-task benchmark for testing repository-level context files with coding agents, and finds that LLM-generated AGENTS.md or CLAUDE.md files slightly reduce task success while raising inference costs by more than 20%. For AI developers, the key takeaway is that agents benefit more from short, high-signal instructions about tooling than from sprawling repository summaries.

// ANALYSIS

This is a useful reality check for the “more context is always better” mindset in agent engineering: extra guidance often turns into extra work, not extra accuracy.

–Across four coding agents, LLM-generated context files lowered success rates on average versus providing no context file at all, while adding 20–23% more cost
–Developer-written context files did a bit better, improving AGENTbench results by about 4% on average, but they still increased steps, reasoning, and spend
–The paper’s strongest finding is behavioral: agents do follow the instructions in context files, but that literal obedience pushes them into more testing, file search, and tool use without helping them find the right code faster
–LLM-generated context files became helpful only when the repo’s other documentation was stripped away, suggesting these files mostly duplicate existing docs rather than add new signal
–For teams building coding agents, the recommendation is straightforward: keep context files minimal, task-relevant, and focused on non-obvious repo requirements such as custom tooling or test workflows

// TAGS

agentbenchagentai-codingbenchmarkresearch

DISCOVERED

124d ago

2026-03-10

PUBLISHED

126d ago

2026-03-08

RELEVANCE

8/ 10

AUTHOR

EnoughNinja

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE2h ago

git/star-history-chart embeds star charts in READMEs

git/star-history-chart is a skill for the Claude Code Templates CLI that generates a repository's star history chart as an SVG and embeds it in the README. The system uses the repository's native GITHUB_TOKEN to fetch stargazer data via a GitHub Actions workflow and commits the output directly, eliminating the need for third-party services or external secret configurations.

VIDEO2h ago

Higgsfield drops developer CLI and MCP server

Higgsfield has launched a developer CLI and MCP server, allowing programmers and autonomous agents to programmatically trigger, customize, and edit marketing ads and cinematic videos directly through terminal commands. Demonstrated by developer Cole Medin using Anthropic's Claude Code and the Archon workflow engine, the toolkit enables fully automated video production pipelines.

OPEN SOURCE2h ago

AI Content Factory automates video ads

AI Content Factory is an open-source workflow that automates bulk marketing video generation from a product catalog. Built on the Archon agentic engine and Higgsfield CLI, it reduces costs by gating expensive video rendering behind cheap image exploration and human approval.