OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoINFRASTRUCTURE
Qwen 3.6 preserve_thinking flag fails in oMLX
A developer reports the preserve_thinking kwarg for Qwen 3.6 is non-functional in oMLX, preventing visibility into the model's reasoning process. The issue persists even when manually editing the configuration file, despite the model's Jinja template explicitly supporting the feature.
// ANALYSIS
This highlights a common friction point in local LLM deployment: feature mismatches between model templates and inference runner implementations.
- –The `preserve_thinking` feature is critical for observability into reasoning models; its failure limits utility for developers tracking model logic
- –The user correctly identified `chat_template_kwargs` in the configuration file, suggesting the issue lies in how oMLX parses or passes these arguments
- –The Jinja template clearly includes the logic, pointing to a potential bug or missing feature in oMLX's handling of quantized Qwen models
// TAGS
omlxqweninferencellmopen-source
DISCOVERED
4h ago
2026-04-22
PUBLISHED
5h ago
2026-04-22
RELEVANCE
6/ 10
AUTHOR
Longjumping-Sweet818