Qwen 3.6 preserve_thinking flag fails in oMLX

// 90d agoINFRASTRUCTURE

Qwen 3.6 preserve_thinking flag fails in oMLX

A developer reports the preserve_thinking kwarg for Qwen 3.6 is non-functional in oMLX, preventing visibility into the model's reasoning process. The issue persists even when manually editing the configuration file, despite the model's Jinja template explicitly supporting the feature.

// ANALYSIS

This highlights a common friction point in local LLM deployment: feature mismatches between model templates and inference runner implementations.

–The `preserve_thinking` feature is critical for observability into reasoning models; its failure limits utility for developers tracking model logic
–The user correctly identified `chat_template_kwargs` in the configuration file, suggesting the issue lies in how oMLX parses or passes these arguments
–The Jinja template clearly includes the logic, pointing to a potential bug or missing feature in oMLX's handling of quantized Qwen models

// TAGS

omlxqweninferencellmopen-source

DISCOVERED

90d ago

2026-04-22

PUBLISHED

90d ago

2026-04-22

RELEVANCE

6/ 10

AUTHOR

Longjumping-Sweet818

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE44m ago

ayghri/i-have-adhd skill enforces direct AI responses

i-have-adhd is an open-source skill designed by ayghri for AI coding assistants like Claude Code and OpenAI Codex. It embeds ten prompt rules into an agent's context to enforce concise, structured answers without conversational preamble.

OPEN SOURCE48m ago

Microsoft releases Ontology Playground for knowledge graphs

Microsoft Ontology Playground is an open-source web application designed to help developers and data architects learn about ontologies and Microsoft Fabric IQ. Built as a fully static, browser-based TypeScript tool with zero backend dependencies, it features an intuitive visual designer for constructing entity types and relationships, interactive graph exploration using Cytoscape.js, pre-built ontology catalogues, and seamless export capabilities to standard RDF/XML and JSON formats.

OPEN SOURCE48m ago

Outlines enforces structured LLM outputs via constrained generation

Outlines, developed by dottxt, is an open-source Python library that enforces strict structure on Large Language Model outputs during generation. Constraining token sampling at the logit level guarantees compliance with JSON schemas, regular expressions, or Pydantic models without brittle retry loops.