llama.cpp Vulkan stumbles on Arrow Lake
A Reddit user says llama.cpp’s Vulkan backend performs terribly on an Arrow Lake Arc 130T iGPU, with decent prompt processing but sub-4 tok/s generation on Gemma 4 E4B. The thread frames SYCL and other Intel-native backends as the real alternative, not Vulkan.
This looks more like backend maturity and memory-bandwidth limits than a hardware surprise. Intel iGPUs are supported, but the post shows why Vulkan still feels like the fallback path rather than the preferred Intel stack.
- –Intel’s llama.cpp docs position SYCL as the primary backend for Intel GPUs, and explicitly list Arrow Lake’s built-in Arc graphics as supported.
- –The numbers fit a familiar pattern: prompt processing can look acceptable while token generation falls apart on integrated graphics.
- –OpenVINO is the other Intel-specific lane worth watching; Vulkan is easier to set up, but not the obvious choice for throughput.
- –For users who want predictable local LLM performance today, a tuned CPU build or a discrete GPU still looks safer than betting on an Intel iGPU backend.
DISCOVERED
1h ago
2026-05-11
PUBLISHED
2h ago
2026-05-11
RELEVANCE
AUTHOR
TuskNaPrezydenta2020