Qwen 3.5 tool calling needs client-side safety nets
Qwen 3.5 is a top-tier model for agentic tasks, but its native XML tool-calling format often breaks standard local server parsers. Community fixes involving regex fallbacks and custom Jinja templates can restore near-perfect reliability for local LLM users.
Qwen 3.5 is brilliant but its "plumbing" is currently broken in almost every major local inference engine.
- –XML tool calls frequently leak as plain text or get buried inside thinking blocks, causing agent loops to hang or crash.
- –Stock Jinja templates fail on basic argument filtering; switching to "barubary-attuned" or Unsloth templates is a non-negotiable requirement for stability.
- –Servers like llama.cpp suffer from "thinking leaks" where internal reasoning tags poison multi-turn context, requiring aggressive client-side stripping.
- –LM Studio v0.4.9 currently leads the pack by natively handling Qwen's specific parsing quirks that vLLM and Ollama still struggle with.
- –The model's reasoning capabilities are elite, but developers must treat it as a "raw" output source rather than relying on server-provided JSON fields.
DISCOVERED
65d ago
2026-04-06
PUBLISHED
65d ago
2026-04-05
RELEVANCE
AUTHOR
FigZestyclose7787
