Kimi K2.5 Stumbles Under Tool Schemas

// 90d agoBENCHMARK RESULT

Kimi K2.5 Stumbles Under Tool Schemas

A Reddit user reports that Kimi K2.5 solved simple reasoning questions more reliably with no tools than with XML or JSON tool schemas enabled. The same pattern showed up in a chemistry follow-up, suggesting tool scaffolding can nudge the model away from direct reasoning and toward unnecessary delegation.

// ANALYSIS

The takeaway is less “tools are bad” than “tool context changes the model’s policy,” and that can hurt on questions where the right move is just to think. Small sample, but the failure mode is real enough to matter for agent builders.

–The car-wash prompt is almost adversarially simple, so any drop in accuracy points to prompt-state interference rather than task difficulty
–The chemistry example strengthens the claim because it spans a different domain and still shows the same no-tools advantage
–If schemas encourage delegation reflexes, benchmark setups with tools may be measuring agent behavior instead of raw model reasoning
–This is a warning for product teams: adding tool APIs can improve capability on real workflows while simultaneously degrading answers on trivial ones
–The sample is tiny, so this is a signal, not proof, but it is exactly the kind of regression worth formalizing in evals

// TAGS

kimi-k2-5qwenreasoningautomationprompt-engineeringagent

DISCOVERED

90d ago

2026-04-27

PUBLISHED

90d ago

2026-04-26

RELEVANCE

8/ 10

AUTHOR

Spirited_Neck1858

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE3h ago

beehiiv adds Model Context Protocol support

Newsletter publishing platform beehiiv has introduced native support for the Model Context Protocol (MCP), allowing AI assistants and local AI environments to interact directly with its publishing system. Through this integration, creators can instruct AI agents to generate newsletter drafts, manage content, and automate publishing workflows seamlessly without leaving their AI tools.

MODEL5h ago

Black Forest Labs previews multimodal model Flux 3

Black Forest Labs has previewed Flux 3, a unified multimodal foundation model designed to natively integrate image creation, audio synthesis, 720p video generation with up to 20 seconds of synchronized sound, and robotics action prediction. Early access features text-to-video, image-to-video, and keyframe transitions, with an open-weight community release planned.

OPEN SOURCE5h ago

Homie brings multi-view consistency to AI video

Homie is an open-source reference-to-video framework designed to solve subject and object identity drift in AI video generation. By leveraging multi-view image inputs alongside multimodal intelligent guidance, Homie maintains consistent visual features and realistic physical interactions between subjects and objects across generated video frames.