OPEN_SOURCE ↗
REDDIT · REDDIT// 8d agoMODEL RELEASE
Qwen 3.5 fine-tune hits tag confusion
A LocalLLaMA user reports unexpected behavior when fine-tuning Qwen 3.5 on reasoning datasets, with the model alternating between `<think>` and its native `<<<reasoning_start>>>` delimiters. The conflict reveals deep-seated architectural priors for reasoning processes that persist even after retraining.
// ANALYSIS
Qwen 3.5's native reasoning mode uses `<<<reasoning_start>>>` and `<<<reasoning_end>>>`, leading to direct conflict when users attempt to fine-tune using alternative tags like `<think>`.
- –The model's behavior shows strong structural priors from its RLHF/SFT training that prioritize the official reasoning delimiters.
- –Retraining with the new tags while still seeing `</think>` suggests a lingering tokenizer or chat template mismatch that hasn't been fully overwritten.
- –This highlights the difficulty of re-mapping reasoning behaviors in models where the chain-of-thought process is deeply integrated into the pre-training and alignment phases.
- –Developers should prioritize aligning their datasets with the model's native special tokens rather than forcing new ones on "thinking-heavy" architectures.
// TAGS
qwen-3.5fine-tuningreasoningllmlocal-llama
DISCOVERED
8d ago
2026-04-03
PUBLISHED
8d ago
2026-04-03
RELEVANCE
8/ 10
AUTHOR
SolarDarkMagician