BACK_TO_FEEDAICRIER_2
Qwen3.5 stumbles on multi-system chat prompts
OPEN_SOURCE ↗
REDDIT · REDDIT// 34d agoINFRASTRUCTURE

Qwen3.5 stumbles on multi-system chat prompts

A LocalLLaMA user reports that Qwen3.5 works fine in plain text-completion mode but rejects SillyTavern chat-completion payloads containing multiple system messages via an OpenAI-compatible API stack. The issue looks less like a model failure and more like prompt-template and serving-layer friction between roleplay presets, llama.cpp compatibility, and Qwen’s chat formatting expectations.

// ANALYSIS

This is the kind of annoying integration bug that matters more than benchmark wins when people actually try to use local models in real apps.

  • The core problem is API-shape compatibility: roleplay presets that rely on multiple `system` messages do not map cleanly onto every OpenAI-style backend.
  • Qwen’s official ecosystem leans on chat templates and structured message handling, so text completion working while chat completion breaks is consistent with a formatting mismatch, not a raw capability gap.
  • For local AI tooling, this is a reminder that SillyTavern presets, llama.cpp server behavior, and model-specific templates can be just as important as the model weights themselves.
  • The workaround implied by the thread—merging system prompts or rewriting the Jinja template—keeps things running, but it can also subtly change behavior in roleplay-heavy setups.
  • This is useful signal for self-hosters: “OpenAI-compatible” still often means “compatible enough, with caveats,” especially for multi-message system prompting.
// TAGS
qwen3-5llmapiprompt-engineeringself-hosted

DISCOVERED

34d ago

2026-03-08

PUBLISHED

34d ago

2026-03-08

RELEVANCE

5/ 10

AUTHOR

Expensive-Paint-9490