OPEN_SOURCE ↗
REDDIT · REDDIT// 23d agoNEWS
Qwen3.5 Re-encodes Images Across Turns
A Reddit user reports that Qwen3.5 re-runs the vision encoder on prior images every time the chat continues in llama.cpp, unlike Qwen3-VL. They tried both Unslo th’s GGUF chat template and the original template, so the issue looks broader than one formatter.
// ANALYSIS
This smells more like multimodal session handling than a pure chat-template bug. If the same image is getting replayed in later turns, the runtime is probably reserializing or re-encoding vision inputs instead of preserving them as cached context.
- –Qwen3-VL’s official docs show image processing is handled through the processor on each request, so persistent image reuse is not something you should assume by default.
- –The fact that both Unsloth’s template and the original template behave the same points away from template syntax and toward llama.cpp’s multimodal cache/state management.
- –Recent llama.cpp and Ollama issues around Qwen3-VL multi-turn failures, KV-cache corruption, and `num_ctx` overflow suggest this stack still has rough edges for repeated image turns.
- –Practical workaround: keep later turns text-only and refer back to the image in prose, or upgrade runtimes and test whether they cache vision features more cleanly.
- –If you need durable “image memory,” store a textual summary after the first image rather than resending the image block every turn.
// TAGS
qwen3-5multimodalllminferenceopen-source
DISCOVERED
23d ago
2026-03-19
PUBLISHED
24d ago
2026-03-19
RELEVANCE
9/ 10
AUTHOR
erazortt