BACK_TO_FEEDAICRIER_2
llama.cpp Vulkan tops ROCm on RDNA3
OPEN_SOURCE ↗
REDDIT · REDDIT// 32d agoBENCHMARK RESULT

llama.cpp Vulkan tops ROCm on RDNA3

A LocalLLaMA benchmark post claims recent llama.cpp build b8262 has flipped the usual AMD pecking order, with Vulkan beating ROCm in prompt processing and in some token-generation tests on RDNA3 cards like the RX 7900 XTX and Radeon Pro W7800. If those results hold across more setups, Vulkan is no longer just the fallback backend for AMD local inference on Linux.

// ANALYSIS

This is less a definitive backend victory lap than a reminder that AMD inference performance in llama.cpp is now highly sensitive to backend, driver, and build details. The bigger story is that Vulkan has moved from “good enough” to “worth testing first” on some modern Radeon setups.

  • The posted results show Vulkan clearly ahead on pp512 for Qwen 3.5 and GLM-4.7 Flash, with especially large gains on the W7800 run
  • ROCm still wins one split-GPU tg128 case for Qwen 3.5, which suggests backend choice may now depend on model architecture and multi-GPU sharding strategy rather than a simple universal ranking
  • On gpt-oss-20b, Vulkan also beats ROCm on tg128, which is notable because token generation has often been ROCm’s stronger side on AMD
  • Recent llama.cpp community discussion already showed Vulkan outperforming ROCm on some 7900 XTX systems, so this post looks like part of a broader pattern rather than a one-off anomaly
  • Developers should treat this as a tuning signal, not a law of nature: Mesa/RADV changes, ROCm compiler quirks, coopmat support, and exact llama.cpp commits can all swing the result hard
// TAGS
llama-cppllminferencegpubenchmarkopen-source

DISCOVERED

32d ago

2026-03-10

PUBLISHED

33d ago

2026-03-09

RELEVANCE

8/ 10

AUTHOR

XccesSv2