BACK_TO_FEEDAICRIER_2
Qwen 3.6 preserve_thinking flag fails in oMLX
OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoINFRASTRUCTURE

Qwen 3.6 preserve_thinking flag fails in oMLX

A developer reports the preserve_thinking kwarg for Qwen 3.6 is non-functional in oMLX, preventing visibility into the model's reasoning process. The issue persists even when manually editing the configuration file, despite the model's Jinja template explicitly supporting the feature.

// ANALYSIS

This highlights a common friction point in local LLM deployment: feature mismatches between model templates and inference runner implementations.

  • The `preserve_thinking` feature is critical for observability into reasoning models; its failure limits utility for developers tracking model logic
  • The user correctly identified `chat_template_kwargs` in the configuration file, suggesting the issue lies in how oMLX parses or passes these arguments
  • The Jinja template clearly includes the logic, pointing to a potential bug or missing feature in oMLX's handling of quantized Qwen models
// TAGS
omlxqweninferencellmopen-source

DISCOVERED

4h ago

2026-04-22

PUBLISHED

5h ago

2026-04-22

RELEVANCE

6/ 10

AUTHOR

Longjumping-Sweet818