OPEN_SOURCE ↗
REDDIT · REDDIT// 34d agoINFRASTRUCTURE
Qwen3.5 stumbles on multi-system chat prompts
A LocalLLaMA user reports that Qwen3.5 works fine in plain text-completion mode but rejects SillyTavern chat-completion payloads containing multiple system messages via an OpenAI-compatible API stack. The issue looks less like a model failure and more like prompt-template and serving-layer friction between roleplay presets, llama.cpp compatibility, and Qwen’s chat formatting expectations.
// ANALYSIS
This is the kind of annoying integration bug that matters more than benchmark wins when people actually try to use local models in real apps.
- –The core problem is API-shape compatibility: roleplay presets that rely on multiple `system` messages do not map cleanly onto every OpenAI-style backend.
- –Qwen’s official ecosystem leans on chat templates and structured message handling, so text completion working while chat completion breaks is consistent with a formatting mismatch, not a raw capability gap.
- –For local AI tooling, this is a reminder that SillyTavern presets, llama.cpp server behavior, and model-specific templates can be just as important as the model weights themselves.
- –The workaround implied by the thread—merging system prompts or rewriting the Jinja template—keeps things running, but it can also subtly change behavior in roleplay-heavy setups.
- –This is useful signal for self-hosters: “OpenAI-compatible” still often means “compatible enough, with caveats,” especially for multi-message system prompting.
// TAGS
qwen3-5llmapiprompt-engineeringself-hosted
DISCOVERED
34d ago
2026-03-08
PUBLISHED
34d ago
2026-03-08
RELEVANCE
5/ 10
AUTHOR
Expensive-Paint-9490