Vercel’s AGENTS.md beats skills in evals

// 91d agoBENCHMARK RESULT

Vercel’s AGENTS.md beats skills in evals

Vercel reports that a compressed 8KB docs index embedded in AGENTS.md hit a 100% pass rate on hardened Next.js 16 agent evals, while skills reached 79% only with explicit instructions and 53% by default. The company also shipped the npx @next/codemod@canary agents-md codemod to inject version-matched docs into projects automatically.

// ANALYSIS

This is less a win for markdown than a win for removing an unreliable agent decision point. Vercel’s result matters because it turns the AGENTS.md vs. skills debate into a measurable reliability question instead of a vibes-only workflow preference.

–The key failure mode was invocation: Vercel says the skill was never triggered in 56% of eval cases, which erased most of its theoretical benefit.
–The winning setup was not dumping full docs into context, but a compressed index that points the agent to local, version-matched `.next-docs` files when needed.
–Vercel found prompt wording was brittle: “explore project first, then invoke skill” beat more forceful phrasing, a sign that current agent behavior is still fragile.
–The broader takeaway for framework authors is to optimize for retrieval-friendly context that is always present, not just tools that agents are supposed to remember to call.
–Hacker News discussion around the post zeroed in on the tradeoff: passive context improves adherence now, but skills still matter for larger toolchains where context budgets are tight.

// TAGS

vercelagentai-codingtestingbenchmarkdevtool

DISCOVERED

91d ago

2026-03-11

PUBLISHED

91d ago

2026-03-11

RELEVANCE

8/ 10

AUTHOR

DIY Smart Code

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL19m ago

Anthropic releases public Claude Mythos model

Anthropic has publicly released a modified version of its frontier AI model, Claude Mythos, under the name Claude Fable 5. The new public version incorporates safety guardrails to restrict offensive cyber capabilities while the unrestricted model remains limited to vetted partners.

MODEL22m ago

Anthropic launches Claude Fable 5

Anthropic has launched Claude Fable 5, a new "Mythos-class" model designed for complex agentic workflows, software engineering, and research synthesis. The model is available via the Claude API, subscription plans, and cloud platforms, with safety guardrails that fallback to Claude Opus for risky queries.

UPDATE31m ago

Vercel v0 adds /improve via Claude Fable 5

Vercel has integrated a new /improve command into its generative UI design tool, v0, to let users leverage Anthropic's new Claude Fable 5 reasoning model. The feature allows developers to invoke the model's advanced reasoning capabilities to iterate, polish, and optimize generated UI code.

Vercel’s AGENTS.md beats skills in evals