Dynamic tool lists clash with KV cache reuse

// 90d agoINFRASTRUCTURE

Dynamic tool lists clash with KV cache reuse

A developer raises a critical architectural trade-off in LLM agent design: dynamically changing the available tool schemas per turn breaks prefix KV cache reuse, leading to higher latency. The community debates fixed tool lists, two-stage routing, and externalized schemas to balance flexibility and efficiency.

// ANALYSIS

The tension between context dynamicism and inference efficiency is a core bottleneck for scaling complex agent architectures.

–Changing JSON schemas in the system prompt forces repeated prefill steps, significantly impacting response times in production systems like vLLM.
–Approaches like a two-stage routing model (routing LLM selects tools, main LLM uses them) add complexity but can isolate cache-breaking changes.
–Keeping a massive, static tool list improves caching but risks overwhelming the model's instruction-following capability.
–Externalizing tool schemas or leveraging MCP (Model Context Protocol) requires careful orchestration to avoid negating the cache benefits.

// TAGS

agentinferencellmmcpprompt-engineering

DISCOVERED

90d ago

2026-04-14

PUBLISHED

90d ago

2026-04-14

RELEVANCE

8/ 10

AUTHOR

niwang66

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL17m ago

OpenAI GPT-5.6 hits Amazon Bedrock

OpenAI's GPT-5.6 model family—including Sol, Terra, and Luna—is now generally available on Amazon Bedrock. Running on Bedrock's next-generation inference engine, the models support prompt caching with a 90% discount and match OpenAI's first-party pricing.

UPDATE1h ago

OpenRouter splits rankings by model weight

OpenRouter has updated its rankings platform by introducing separate leaderboards for open-weight and closed-weight models. This allows developers to track and compare usage statistics of proprietary, API-exclusive models against downloadable open-weight models.

UPDATE1h ago

Codex and Claude Code introduce advanced in-app browser capabilities, including multi-tab support and cookie imports, accelerating the shift toward autonomous computer use.

Codex has updated its in-app browser to support multiple tabs, cookie importing, and password persistence, with Anthropic's Claude Code quickly following with similar web-browsing capabilities. These upgrades allow AI agents to navigate authenticated sites and perform browser-based tasks alongside code editors and terminals. By embedding robust browser control directly into the agentic environment, developers can execute end-to-end workflows without leaving the command line or workspace app.