Qwen3.5 stumbles on multi-system chat prompts

// 125d agoINFRASTRUCTURE

Qwen3.5 stumbles on multi-system chat prompts

A LocalLLaMA user reports that Qwen3.5 works fine in plain text-completion mode but rejects SillyTavern chat-completion payloads containing multiple system messages via an OpenAI-compatible API stack. The issue looks less like a model failure and more like prompt-template and serving-layer friction between roleplay presets, llama.cpp compatibility, and Qwen’s chat formatting expectations.

// ANALYSIS

This is the kind of annoying integration bug that matters more than benchmark wins when people actually try to use local models in real apps.

–The core problem is API-shape compatibility: roleplay presets that rely on multiple `system` messages do not map cleanly onto every OpenAI-style backend.
–Qwen’s official ecosystem leans on chat templates and structured message handling, so text completion working while chat completion breaks is consistent with a formatting mismatch, not a raw capability gap.
–For local AI tooling, this is a reminder that SillyTavern presets, llama.cpp server behavior, and model-specific templates can be just as important as the model weights themselves.
–The workaround implied by the thread—merging system prompts or rewriting the Jinja template—keeps things running, but it can also subtly change behavior in roleplay-heavy setups.
–This is useful signal for self-hosters: “OpenAI-compatible” still often means “compatible enough, with caveats,” especially for multi-message system prompting.

// TAGS

qwen3-5llmapiprompt-engineeringself-hosted

DISCOVERED

125d ago

2026-03-08

PUBLISHED

125d ago

2026-03-08

RELEVANCE

5/ 10

AUTHOR

Expensive-Paint-9490

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE1h ago

Lightpanda merges IndexedDB support for automation

Lightpanda, the open-source headless browser engine written in Zig for web automation and AI agents, has added base implementation support for IndexedDB to its main branch. This update allows scripts that depend on IndexedDB for client-side storage to execute successfully, removing a significant barrier for automation and scraping workflows on modern web applications.

OPEN SOURCE1h ago

LangChain-Chatchat builds local private RAG pipelines

LangChain-Chatchat is an open-source, local knowledge-based QA application and RAG framework built on LangChain, FastAPI, and Streamlit. It provides a private, offline pipeline that integrates with Ollama and Xinference to support open-source models like Llama3 and Qwen2.

OPEN SOURCE2h ago

prose stylesheet forces clean AI writing

prose is a lightweight, single-file Markdown prompt configuration that guides AI coding agents to communicate like a direct, confident senior engineer. Appended directly to local agent instruction files, it establishes clear rules to eliminate common AI patterns like cheesy setups, over-bulleted reasoning, and theatrical language.