Qwen3.5 reasoning loops force llama.cpp tweaks

// 107d agoTUTORIAL

Qwen3.5 reasoning loops force llama.cpp tweaks

ANNOUNCEMENT PRODUCT GITHUB PRODUCT HUNT

LocalLLaMA users are seeing Qwen3.5 Q4 slip into long reasoning loops under llama.cpp and asking which knobs actually help. The thread points toward explicit thinking-mode control plus the model’s recommended non-thinking sampler settings, not a mysterious hard failure.

// ANALYSIS

This looks mostly like a mode-control problem, not a broken model.

–The official Qwen3.5 recipe for direct-response mode is `enable_thinking=False` plus `temperature=0.7`, `top_p=0.8`, `top_k=20`, `presence_penalty=1.5`, and `repetition_penalty=1.0`.
–Qwen3.5 does not officially support the older `/think` and `/nothink` soft switch, so prompt hacks are less dependable than template-level control.
–If `enable_thinking` still leaks through in llama.cpp, the likely culprit is a template or server-version mismatch rather than sampler settings alone.
–For local deployments, the practical split is simple: force non-thinking for chat and keep thinking mode only for tasks that genuinely benefit from long deliberation.

// TAGS

qwen3-5llama-cppllmreasoninginferenceself-hostedopen-weights

DISCOVERED

107d ago

2026-03-29

PUBLISHED

107d ago

2026-03-29

RELEVANCE

8/ 10

AUTHOR

XiRw

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE39m ago

scroll-world launches scroll-driven 3D flight skill

scroll-world is an open-source, framework-agnostic agent skill that leverages Higgsfield to generate immersive, scroll-driven 3D camera flights through diorama scenes for landing pages. By rendering seamless connection clips between neighboring frames, it allows developers to build interactive 3D narrative websites navigated simply by scrolling, without requiring heavy game engines.

MODEL1h ago

OpenAI GPT-5.6 hits Amazon Bedrock

OpenAI's GPT-5.6 model family—including Sol, Terra, and Luna—is now generally available on Amazon Bedrock. Running on Bedrock's next-generation inference engine, the models support prompt caching with a 90% discount and match OpenAI's first-party pricing.

UPDATE2h ago

OpenRouter splits rankings by model weight

OpenRouter has updated its rankings platform by introducing separate leaderboards for open-weight and closed-weight models. This allows developers to track and compare usage statistics of proprietary, API-exclusive models against downloadable open-weight models.