Qwen3.5 stumbles on multi-system chat prompts
A LocalLLaMA user reports that Qwen3.5 works fine in plain text-completion mode but rejects SillyTavern chat-completion payloads containing multiple system messages via an OpenAI-compatible API stack. The issue looks less like a model failure and more like prompt-template and serving-layer friction between roleplay presets, llama.cpp compatibility, and Qwen’s chat formatting expectations.
This is the kind of annoying integration bug that matters more than benchmark wins when people actually try to use local models in real apps.
- –The core problem is API-shape compatibility: roleplay presets that rely on multiple `system` messages do not map cleanly onto every OpenAI-style backend.
- –Qwen’s official ecosystem leans on chat templates and structured message handling, so text completion working while chat completion breaks is consistent with a formatting mismatch, not a raw capability gap.
- –For local AI tooling, this is a reminder that SillyTavern presets, llama.cpp server behavior, and model-specific templates can be just as important as the model weights themselves.
- –The workaround implied by the thread—merging system prompts or rewriting the Jinja template—keeps things running, but it can also subtly change behavior in roleplay-heavy setups.
- –This is useful signal for self-hosters: “OpenAI-compatible” still often means “compatible enough, with caveats,” especially for multi-message system prompting.
DISCOVERED
80d ago
2026-03-08
PUBLISHED
80d ago
2026-03-08
RELEVANCE
AUTHOR
Expensive-Paint-9490