Tool Calls Break KV-Cache Reuse in llama.cpp

// 90d agoINFRASTRUCTURE

Tool Calls Break KV-Cache Reuse in llama.cpp

This Reddit post describes repeated “checkpoint” loss during single-user chats in llama.cpp-backed frontends, specifically when function/tool calls are part of the conversation history. The poster says the issue shows up across Cherry Studio and Open WebUI with Qwen 3.5/3.6 models, even with plenty of context and cache RAM available, and suspects the problem may be related to tool-call content or thinking traces not being preserved between turns.

// ANALYSIS

Hot take: this sounds more like prompt reconstruction and chat-template mismatch than a raw context-capacity problem.

–llama.cpp’s cache reuse depends on the exact prompt prefix staying stable; if a frontend rewrites or omits tool/tool_result turns, the reusable KV prefix is effectively broken.
–Function-calling support in llama.cpp is real, but it is still sensitive to template format and client behavior, so “erased checkpoints” can be the symptom of serialization drift rather than memory exhaustion.
–`preserve_thinking: true` only helps if the model/template path actually carries reasoning traces forward; it will not fix missing tool-call round-trips.
–The most useful debug step is to compare the exact JSON sent on the next turn, especially whether assistant `tool_calls` and subsequent `tool` messages are being replayed verbatim.

// TAGS

llama-cppkv-cachefunction callingtool callingopen-webuicherry studioqwencontext management

DISCOVERED

90d ago

2026-04-17

PUBLISHED

91d ago

2026-04-17

RELEVANCE

5/ 10

AUTHOR

SimilarWarthog8393

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE49m ago

OpenClaw roasts GitHub commits in real-time

Peter Steinberger demonstrated his autonomous AI agent, OpenClaw (formerly Moltbot/Clawdbot), monitoring a GitHub repository and roasting commits in real-time. OpenClaw is an open-source, self-hosted AI agent framework designed to execute shell commands, manage files, and automate tasks through messaging applications.

LAUNCH3h ago

PALO-AI launches agentic governance architecture

Fabrizio Degni has announced the developer preview of PALO-AI, a reference architecture that uses governance contracts to manage and audit the delegated authority of autonomous agents and collaborative teams. The preview includes sample JSON contracts, Rego policies, Model Context Protocol (MCP) tool definitions, and integration examples for n8n and Dify.

TUTORIAL3h ago

Microsoft "ML for Beginners" adds 50+ translations

Microsoft's popular 12-week open-source machine learning curriculum, ML for Beginners, has been updated to offer automated, always up-to-date translations into more than 50 languages, including Arabic, Hindi, and Swahili. This update aims to lower barriers to entry for aspiring machine learning practitioners globally by making the educational content accessible in their native languages.

Tool Calls Break KV-Cache Reuse in llama.cpp