BACK_TO_FEEDAICRIER_2
Kimi K2.5 Stumbles Under Tool Schemas
OPEN_SOURCE ↗
REDDIT · REDDIT// 5h agoBENCHMARK RESULT

Kimi K2.5 Stumbles Under Tool Schemas

A Reddit user reports that Kimi K2.5 solved simple reasoning questions more reliably with no tools than with XML or JSON tool schemas enabled. The same pattern showed up in a chemistry follow-up, suggesting tool scaffolding can nudge the model away from direct reasoning and toward unnecessary delegation.

// ANALYSIS

The takeaway is less “tools are bad” than “tool context changes the model’s policy,” and that can hurt on questions where the right move is just to think. Small sample, but the failure mode is real enough to matter for agent builders.

  • The car-wash prompt is almost adversarially simple, so any drop in accuracy points to prompt-state interference rather than task difficulty
  • The chemistry example strengthens the claim because it spans a different domain and still shows the same no-tools advantage
  • If schemas encourage delegation reflexes, benchmark setups with tools may be measuring agent behavior instead of raw model reasoning
  • This is a warning for product teams: adding tool APIs can improve capability on real workflows while simultaneously degrading answers on trivial ones
  • The sample is tiny, so this is a signal, not proof, but it is exactly the kind of regression worth formalizing in evals
// TAGS
kimi-k2-5qwenreasoningautomationprompt-engineeringagent

DISCOVERED

5h ago

2026-04-27

PUBLISHED

5h ago

2026-04-26

RELEVANCE

8/ 10

AUTHOR

Spirited_Neck1858