OPEN_SOURCE ↗
REDDIT · REDDIT// 34d agoINFRASTRUCTURE
Qwen3.5 agent setups still break on llama.cpp
A LocalLLaMA thread asks why Qwen3.5-35B becomes unreliable as an agent when run through llama-server with `--jinja` and ZeroClaw on older dual-GPU hardware, with tool calls intermittently returning 400 and 500 errors. The strongest clue from the post is that the failures seem tied to streaming and tool-call parsing rather than raw generation speed, and the lone reply points to a working DreamServer setup using a similar Qwen stack.
// ANALYSIS
This is less a product announcement than a useful snapshot of where local agent stacks still crack: open models are advancing faster than the surrounding tool-calling infrastructure.
- –The poster reports better stability when streaming is disabled, which lines up with broader community chatter around Qwen3.5 parser and tool-call edge cases.
- –The hardware note matters: Qwen3.5 is being pushed on a mixed RTX3070 plus RTX5060 Ti setup, showing local agent experiments are reaching old consumer rigs, not just cloud boxes.
- –The only concrete answer in-thread points to DreamServer and says Qwen3-Coder-Next works out of the box on llama-server, suggesting template or runtime compatibility issues more than a hard model limitation.
// TAGS
qwen3-5llama-cppllmagentinference
DISCOVERED
34d ago
2026-03-08
PUBLISHED
34d ago
2026-03-08
RELEVANCE
6/ 10
AUTHOR
QKVfan