Qwen 3.5 template bug breaks KV-cache reuse

// 49d agoOPENSOURCE RELEASE

Qwen 3.5 template bug breaks KV-cache reuse

A critical bug in the official Qwen 3.5 chat template causes KV-cache misses and high latency in agentic workflows by redundantly emitting empty historical think tags. A simple one-line guard in the Jinja template restores cache efficiency, significantly improving performance for local model deployments.

// ANALYSIS

The "thinking" model era has introduced a new class of brittle failure modes where performance is dictated by Jinja template serialization discipline rather than raw model weights.

–Empty `<think>` tags in history create non-identical prompt prefixes, resulting in near-zero cache hit rates and "Time To First Token" (TTFT) spikes.
–Agent workflows are disproportionately affected because frequent tool calls and follow-ups compound the cumulative impact of template-induced drift.
–This issue highlights the fragility of the "KV-cache contract" in reasoning models, where even minor formatting inconsistencies can wipe out the benefits of prefix caching.
–Developers using Qwen 3.5 locally should prioritize patching their `chat_template.jinja` before attempting more complex infrastructure-level cache workarounds.

// TAGS

qwen-3.5llminferencereasoningopen-sourceprompt-engineeringdevtool

DISCOVERED

49d ago

2026-04-08

PUBLISHED

49d ago

2026-04-08

RELEVANCE

8/ 10

AUTHOR

onil_gova

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS38m ago

CodeRabbit Draws Demo Crowds at App.js Conf

A retweeted post from CodeRabbit says the team is having a hectic time at App.js Conf and is asking for more hands because they cannot keep up with showing people the product. This reads as a traction and field-interest signal rather than a product announcement, with the main takeaway being that the booth/demo activity is pulling in more attention than the team can comfortably handle.

NEWS42m ago

Anthropic hits first profit on $10.9B Q2 revenue

Anthropic is poised to record its first operating profit in Q2 2026, driven by a massive $10.9 billion revenue run and a strategic pivot to enterprise sales. The financial turnaround highlights the explosive monetization potential of developer-focused coding agents like Claude Code.

NEWS42m ago

Anthropic hits profitability as Claude Code usage surges

Anthropic achieved its first operating profit in Q2 2026, driven by a massive shift toward usage-based enterprise pricing. The company's agentic CLI, Claude Code, has become its primary revenue engine by consuming high volumes of tokens for autonomous coding tasks.