OPEN_SOURCE ↗
REDDIT · REDDIT// 14h agoINFRASTRUCTURE
Tool Calls Break KV-Cache Reuse in llama.cpp
This Reddit post describes repeated “checkpoint” loss during single-user chats in llama.cpp-backed frontends, specifically when function/tool calls are part of the conversation history. The poster says the issue shows up across Cherry Studio and Open WebUI with Qwen 3.5/3.6 models, even with plenty of context and cache RAM available, and suspects the problem may be related to tool-call content or thinking traces not being preserved between turns.
// ANALYSIS
Hot take: this sounds more like prompt reconstruction and chat-template mismatch than a raw context-capacity problem.
- –llama.cpp’s cache reuse depends on the exact prompt prefix staying stable; if a frontend rewrites or omits tool/tool_result turns, the reusable KV prefix is effectively broken.
- –Function-calling support in llama.cpp is real, but it is still sensitive to template format and client behavior, so “erased checkpoints” can be the symptom of serialization drift rather than memory exhaustion.
- –`preserve_thinking: true` only helps if the model/template path actually carries reasoning traces forward; it will not fix missing tool-call round-trips.
- –The most useful debug step is to compare the exact JSON sent on the next turn, especially whether assistant `tool_calls` and subsequent `tool` messages are being replayed verbatim.
// TAGS
llama-cppkv-cachefunction callingtool callingopen-webuicherry studioqwencontext management
DISCOVERED
14h ago
2026-04-17
PUBLISHED
16h ago
2026-04-17
RELEVANCE
5/ 10
AUTHOR
SimilarWarthog8393