OPEN_SOURCE ↗
REDDIT · REDDIT// 7h agoINFRASTRUCTURE
Qwen3-VL hits Vulkan inference friction
A LocalLLaMA user reports empty image descriptions when running Qwen3-VL and Qwen2.5-VL through a Vulkan-compiled llama.cpp build. The thread points to the still-fragile state of local multimodal inference, where matching GGUF and mmproj files, fresh llama.cpp builds, and backend-specific vision support all matter.
// ANALYSIS
Qwen3-VL may be broadly supported in llama.cpp now, but “supported” still does not mean painless across every GPU backend.
- –Vulkan remains a rougher path than CUDA or Metal for multimodal workloads, especially on edge cases involving vision encoders.
- –Empty captions usually suggest the vision side is not actually being wired in, often because the mmproj file is missing, mismatched, or not loaded correctly.
- –Qwen2.5-VL failing too makes this look less like a single-model issue and more like a local setup, prompt format, or backend support problem.
- –For developers, the practical test is simple: verify the same model and mmproj on CPU or CUDA first, then isolate Vulkan-specific failures.
// TAGS
qwen3-vlqwen2.5-vlllama.cppmultimodalinferencegpuopen-weights
DISCOVERED
7h ago
2026-04-22
PUBLISHED
10h ago
2026-04-22
RELEVANCE
7/ 10
AUTHOR
WorldlinessTime634