Qwen3-VL hits Vulkan inference friction
A LocalLLaMA user reports empty image descriptions when running Qwen3-VL and Qwen2.5-VL through a Vulkan-compiled llama.cpp build. The thread points to the still-fragile state of local multimodal inference, where matching GGUF and mmproj files, fresh llama.cpp builds, and backend-specific vision support all matter.
Qwen3-VL may be broadly supported in llama.cpp now, but “supported” still does not mean painless across every GPU backend.
- –Vulkan remains a rougher path than CUDA or Metal for multimodal workloads, especially on edge cases involving vision encoders.
- –Empty captions usually suggest the vision side is not actually being wired in, often because the mmproj file is missing, mismatched, or not loaded correctly.
- –Qwen2.5-VL failing too makes this look less like a single-model issue and more like a local setup, prompt format, or backend support problem.
- –For developers, the practical test is simple: verify the same model and mmproj on CPU or CUDA first, then isolate Vulkan-specific failures.
DISCOVERED
45d ago
2026-04-22
PUBLISHED
45d ago
2026-04-22
RELEVANCE
AUTHOR
WorldlinessTime634