REDDIT · REDDIT// 3h agoBENCHMARK RESULT

llama.cpp Vulkan tops ROCm on Strix Halo

A Strix Halo user benchmarked llama.cpp on AMD Radeon 8060S gfx1151 and found the Vulkan backend outpaced ROCm on Qwen3.6-35B-A3B. Prompt processing was roughly tied, but token generation was about 21% faster and more stable on Vulkan.

// ANALYSIS

Hot take: this looks less like a fluke and more like ROCm still having rough edges on gfx1151, especially for MoE inference paths.

–The gap is concentrated in generation, which is the metric users feel most in interactive and agentic workloads.
–The same binary, model, and hardware stack reduce the chance this is just setup noise.
–The lower std dev on Vulkan suggests a more consistent kernel path, not just a one-off speed burst.
–This lines up with broader Strix Halo chatter that ROCm support is improving but still uneven versus Vulkan on some workloads.
–It is still a single benchmark on one model; dense models and longer contexts may show different results.

// TAGS

benchmarkgpuinferencemoeopen-sourcellama-cpp

DISCOVERED

3h ago

2026-05-05

PUBLISHED

6h ago

2026-05-05

RELEVANCE

8/ 10

AUTHOR

FeiX7