OPEN_SOURCE ↗
REDDIT · REDDIT// 3d agoOPENSOURCE RELEASE
Qwen 3.5 template bug breaks KV-cache reuse
A critical bug in the official Qwen 3.5 chat template causes KV-cache misses and high latency in agentic workflows by redundantly emitting empty historical think tags. A simple one-line guard in the Jinja template restores cache efficiency, significantly improving performance for local model deployments.
// ANALYSIS
The "thinking" model era has introduced a new class of brittle failure modes where performance is dictated by Jinja template serialization discipline rather than raw model weights.
- –Empty `<think>` tags in history create non-identical prompt prefixes, resulting in near-zero cache hit rates and "Time To First Token" (TTFT) spikes.
- –Agent workflows are disproportionately affected because frequent tool calls and follow-ups compound the cumulative impact of template-induced drift.
- –This issue highlights the fragility of the "KV-cache contract" in reasoning models, where even minor formatting inconsistencies can wipe out the benefits of prefix caching.
- –Developers using Qwen 3.5 locally should prioritize patching their `chat_template.jinja` before attempting more complex infrastructure-level cache workarounds.
// TAGS
qwen-3.5llminferencereasoningopen-sourceprompt-engineeringdevtool
DISCOVERED
3d ago
2026-04-08
PUBLISHED
3d ago
2026-04-08
RELEVANCE
8/ 10
AUTHOR
onil_gova