OPEN_SOURCE ↗
REDDIT · REDDIT// 3h agoNEWS
DeepSeek-V3.2 GGUFs "eat" think tags
Users running Unsloth's DeepSeek-V3.2 GGUF models on llama-server report missing opening <think> tags, which breaks reasoning UI features in tools like Open WebUI. The issue is caused by the chat template prepending the tag to the assistant's response within the prompt, effectively omitting it from the generated output stream.
// ANALYSIS
The missing tag bug is a classic chat template mismatch that highlights the friction between raw GGUF quants and complex reasoning models.
- –The <think> tag is included as a postfix in the prompt template, meaning the model starts generating content after the tag has already been "consumed."
- –Frontends like Open WebUI rely on the literal presence of the <think> tag in the stream to trigger collapsed reasoning blocks; without it, the raw "thought" text leaks into the main UI.
- –The immediate fix is using the --jinja flag in llama-server to ensure the internal engine correctly handles the reasoning field.
- –This recurrence of a known R1-era bug suggests that quantization pipelines for newer DeepSeek versions are still struggling with template consistency across different inference engines.
// TAGS
deepseek-v3-2llama-cppggufunslothreasoningllm
DISCOVERED
3h ago
2026-04-20
PUBLISHED
5h ago
2026-04-20
RELEVANCE
8/ 10
AUTHOR
Winter_Engineer2163