OPEN_SOURCE ↗
REDDIT · REDDIT// 3h agoBENCHMARK RESULT
llama.cpp Vulkan tops ROCm on Strix Halo
A Strix Halo user benchmarked llama.cpp on AMD Radeon 8060S gfx1151 and found the Vulkan backend outpaced ROCm on Qwen3.6-35B-A3B. Prompt processing was roughly tied, but token generation was about 21% faster and more stable on Vulkan.
// ANALYSIS
Hot take: this looks less like a fluke and more like ROCm still having rough edges on gfx1151, especially for MoE inference paths.
- –The gap is concentrated in generation, which is the metric users feel most in interactive and agentic workloads.
- –The same binary, model, and hardware stack reduce the chance this is just setup noise.
- –The lower std dev on Vulkan suggests a more consistent kernel path, not just a one-off speed burst.
- –This lines up with broader Strix Halo chatter that ROCm support is improving but still uneven versus Vulkan on some workloads.
- –It is still a single benchmark on one model; dense models and longer contexts may show different results.
// TAGS
benchmarkgpuinferencemoeopen-sourcellama-cpp
DISCOVERED
3h ago
2026-05-05
PUBLISHED
6h ago
2026-05-05
RELEVANCE
8/ 10
AUTHOR
FeiX7