BACK_TO_FEEDAICRIER_2
Qwen3.5 agent setups still break on llama.cpp
OPEN_SOURCE ↗
REDDIT · REDDIT// 34d agoINFRASTRUCTURE

Qwen3.5 agent setups still break on llama.cpp

A LocalLLaMA thread asks why Qwen3.5-35B becomes unreliable as an agent when run through llama-server with `--jinja` and ZeroClaw on older dual-GPU hardware, with tool calls intermittently returning 400 and 500 errors. The strongest clue from the post is that the failures seem tied to streaming and tool-call parsing rather than raw generation speed, and the lone reply points to a working DreamServer setup using a similar Qwen stack.

// ANALYSIS

This is less a product announcement than a useful snapshot of where local agent stacks still crack: open models are advancing faster than the surrounding tool-calling infrastructure.

  • The poster reports better stability when streaming is disabled, which lines up with broader community chatter around Qwen3.5 parser and tool-call edge cases.
  • The hardware note matters: Qwen3.5 is being pushed on a mixed RTX3070 plus RTX5060 Ti setup, showing local agent experiments are reaching old consumer rigs, not just cloud boxes.
  • The only concrete answer in-thread points to DreamServer and says Qwen3-Coder-Next works out of the box on llama-server, suggesting template or runtime compatibility issues more than a hard model limitation.
// TAGS
qwen3-5llama-cppllmagentinference

DISCOVERED

34d ago

2026-03-08

PUBLISHED

34d ago

2026-03-08

RELEVANCE

6/ 10

AUTHOR

QKVfan