BACK_TO_FEEDAICRIER_2
DeepSeek-V3.2 GGUFs "eat" think tags
OPEN_SOURCE ↗
REDDIT · REDDIT// 3h agoNEWS

DeepSeek-V3.2 GGUFs "eat" think tags

Users running Unsloth's DeepSeek-V3.2 GGUF models on llama-server report missing opening <think> tags, which breaks reasoning UI features in tools like Open WebUI. The issue is caused by the chat template prepending the tag to the assistant's response within the prompt, effectively omitting it from the generated output stream.

// ANALYSIS

The missing tag bug is a classic chat template mismatch that highlights the friction between raw GGUF quants and complex reasoning models.

  • The <think> tag is included as a postfix in the prompt template, meaning the model starts generating content after the tag has already been "consumed."
  • Frontends like Open WebUI rely on the literal presence of the <think> tag in the stream to trigger collapsed reasoning blocks; without it, the raw "thought" text leaks into the main UI.
  • The immediate fix is using the --jinja flag in llama-server to ensure the internal engine correctly handles the reasoning field.
  • This recurrence of a known R1-era bug suggests that quantization pipelines for newer DeepSeek versions are still struggling with template consistency across different inference engines.
// TAGS
deepseek-v3-2llama-cppggufunslothreasoningllm

DISCOVERED

3h ago

2026-04-20

PUBLISHED

5h ago

2026-04-20

RELEVANCE

8/ 10

AUTHOR

Winter_Engineer2163