BACK_TO_FEEDAICRIER_2
Qwen3.5 Re-encodes Images Across Turns
OPEN_SOURCE ↗
REDDIT · REDDIT// 23d agoNEWS

Qwen3.5 Re-encodes Images Across Turns

A Reddit user reports that Qwen3.5 re-runs the vision encoder on prior images every time the chat continues in llama.cpp, unlike Qwen3-VL. They tried both Unslo th’s GGUF chat template and the original template, so the issue looks broader than one formatter.

// ANALYSIS

This smells more like multimodal session handling than a pure chat-template bug. If the same image is getting replayed in later turns, the runtime is probably reserializing or re-encoding vision inputs instead of preserving them as cached context.

  • Qwen3-VL’s official docs show image processing is handled through the processor on each request, so persistent image reuse is not something you should assume by default.
  • The fact that both Unsloth’s template and the original template behave the same points away from template syntax and toward llama.cpp’s multimodal cache/state management.
  • Recent llama.cpp and Ollama issues around Qwen3-VL multi-turn failures, KV-cache corruption, and `num_ctx` overflow suggest this stack still has rough edges for repeated image turns.
  • Practical workaround: keep later turns text-only and refer back to the image in prose, or upgrade runtimes and test whether they cache vision features more cleanly.
  • If you need durable “image memory,” store a textual summary after the first image rather than resending the image block every turn.
// TAGS
qwen3-5multimodalllminferenceopen-source

DISCOVERED

23d ago

2026-03-19

PUBLISHED

24d ago

2026-03-19

RELEVANCE

9/ 10

AUTHOR

erazortt