BACK_TO_FEEDAICRIER_2
Qwen 3.5 template bug breaks KV-cache reuse
OPEN_SOURCE ↗
REDDIT · REDDIT// 3d agoOPENSOURCE RELEASE

Qwen 3.5 template bug breaks KV-cache reuse

A critical bug in the official Qwen 3.5 chat template causes KV-cache misses and high latency in agentic workflows by redundantly emitting empty historical think tags. A simple one-line guard in the Jinja template restores cache efficiency, significantly improving performance for local model deployments.

// ANALYSIS

The "thinking" model era has introduced a new class of brittle failure modes where performance is dictated by Jinja template serialization discipline rather than raw model weights.

  • Empty `<think>` tags in history create non-identical prompt prefixes, resulting in near-zero cache hit rates and "Time To First Token" (TTFT) spikes.
  • Agent workflows are disproportionately affected because frequent tool calls and follow-ups compound the cumulative impact of template-induced drift.
  • This issue highlights the fragility of the "KV-cache contract" in reasoning models, where even minor formatting inconsistencies can wipe out the benefits of prefix caching.
  • Developers using Qwen 3.5 locally should prioritize patching their `chat_template.jinja` before attempting more complex infrastructure-level cache workarounds.
// TAGS
qwen-3.5llminferencereasoningopen-sourceprompt-engineeringdevtool

DISCOVERED

3d ago

2026-04-08

PUBLISHED

3d ago

2026-04-08

RELEVANCE

8/ 10

AUTHOR

onil_gova