BACK_TO_FEEDAICRIER_2
llama.cpp Vulkan tops ROCm on Strix Halo
OPEN_SOURCE ↗
REDDIT · REDDIT// 3h agoBENCHMARK RESULT

llama.cpp Vulkan tops ROCm on Strix Halo

A Strix Halo user benchmarked llama.cpp on AMD Radeon 8060S gfx1151 and found the Vulkan backend outpaced ROCm on Qwen3.6-35B-A3B. Prompt processing was roughly tied, but token generation was about 21% faster and more stable on Vulkan.

// ANALYSIS

Hot take: this looks less like a fluke and more like ROCm still having rough edges on gfx1151, especially for MoE inference paths.

  • The gap is concentrated in generation, which is the metric users feel most in interactive and agentic workloads.
  • The same binary, model, and hardware stack reduce the chance this is just setup noise.
  • The lower std dev on Vulkan suggests a more consistent kernel path, not just a one-off speed burst.
  • This lines up with broader Strix Halo chatter that ROCm support is improving but still uneven versus Vulkan on some workloads.
  • It is still a single benchmark on one model; dense models and longer contexts may show different results.
// TAGS
benchmarkgpuinferencemoeopen-sourcellama-cpp

DISCOVERED

3h ago

2026-05-05

PUBLISHED

6h ago

2026-05-05

RELEVANCE

8/ 10

AUTHOR

FeiX7