Qwen 3.6 preserve_thinking flag fails in oMLX
A developer reports the preserve_thinking kwarg for Qwen 3.6 is non-functional in oMLX, preventing visibility into the model's reasoning process. The issue persists even when manually editing the configuration file, despite the model's Jinja template explicitly supporting the feature.
This highlights a common friction point in local LLM deployment: feature mismatches between model templates and inference runner implementations.
- –The `preserve_thinking` feature is critical for observability into reasoning models; its failure limits utility for developers tracking model logic
- –The user correctly identified `chat_template_kwargs` in the configuration file, suggesting the issue lies in how oMLX parses or passes these arguments
- –The Jinja template clearly includes the logic, pointing to a potential bug or missing feature in oMLX's handling of quantized Qwen models
DISCOVERED
45d ago
2026-04-22
PUBLISHED
45d ago
2026-04-22
RELEVANCE
AUTHOR
Longjumping-Sweet818