Qwen3.6-35B-A3B fails closing CoT token
Alibaba's new sparse MoE model occasionally outputs the multi-token string </thinking> instead of the dedicated </think> closing token during reasoning. This minor regression breaks API adapters and coding harnesses that rely on precise token detection to separate internal reasoning from final output.
This "infinite thinking" bug highlights the fragility of CoT-enabled models when paired with strict regex-based output parsers. The mismatch between the model's intended vocabulary and its generated output suggests training or quantization edge cases, with observed issues occurring across context lengths from 16k to 128k. While quantization like IQ4_NL may exacerbate the behavior, workarounds involve manual Jinja template adjustments or the use of specific reasoning parsers like the vLLM qwen3 implementation.
DISCOVERED
7h ago
2026-04-19
PUBLISHED
9h ago
2026-04-19
RELEVANCE
AUTHOR
Confident_Ideal_5385