DeepSeek-V3.2 GGUFs "eat" think tags

// 90d agoNEWS

DeepSeek-V3.2 GGUFs "eat" think tags

Users running Unsloth's DeepSeek-V3.2 GGUF models on llama-server report missing opening <think> tags, which breaks reasoning UI features in tools like Open WebUI. The issue is caused by the chat template prepending the tag to the assistant's response within the prompt, effectively omitting it from the generated output stream.

// ANALYSIS

The missing tag bug is a classic chat template mismatch that highlights the friction between raw GGUF quants and complex reasoning models.

–The <think> tag is included as a postfix in the prompt template, meaning the model starts generating content after the tag has already been "consumed."
–Frontends like Open WebUI rely on the literal presence of the <think> tag in the stream to trigger collapsed reasoning blocks; without it, the raw "thought" text leaks into the main UI.
–The immediate fix is using the --jinja flag in llama-server to ensure the internal engine correctly handles the reasoning field.
–This recurrence of a known R1-era bug suggests that quantization pipelines for newer DeepSeek versions are still struggling with template consistency across different inference engines.

// TAGS

deepseek-v3-2llama-cppggufunslothreasoningllm

DISCOVERED

90d ago

2026-04-20

PUBLISHED

90d ago

2026-04-20

RELEVANCE

8/ 10

AUTHOR

Winter_Engineer2163

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

LAUNCH31m ago

DoorDash CLI orders food from terminal

Developer Daniel Avila successfully used Anthropic's Claude Code to order a burrito using DoorDash's new CLI. During the run, the agent's Auto Mode acted as a safety guardrail by blocking the final transaction command, ensuring that financial actions require human approval.

LAUNCH31m ago

DAIR.AI launches AI paper collection, AI tutor

DAIR.AI has launched the "AI Papers of the Week" collection to centralize key machine learning research. The platform also introduced an interactive AI tutor tool that recommends relevant academic papers based on user-provided topics.

MODEL45m ago

Alibaba drops 2.4-trillion parameter Qwen3.8 MoE

Alibaba Cloud has unveiled Qwen3.8-Max-Preview, a 2.4-trillion-parameter Mixture-of-Experts (MoE) multimodal model available via its Token Plan and Qoder. The proprietary preview targets enterprise developers with significant upgrades in coding and analysis, with plans for a future open-source release.