BACK_TO_FEEDAICRIER_2
llama.cpp Mi50 benchmarks pit ROCm, Vulkan
OPEN_SOURCE ↗
REDDIT · REDDIT// 20d agoBENCHMARK RESULT

llama.cpp Mi50 benchmarks pit ROCm, Vulkan

On a Mi50 32GB, llama.cpp's TheRock ROCm 7.13 nightly and Vulkan backend split wins by workload. Vulkan is quicker for short-context dense-model chat, while ROCm takes over once context stretches or MoE and CPU-offload paths enter the mix.

// ANALYSIS

This looks less like a universal backend verdict than a reminder that local inference performance is shaped by context length and model topology.

  • For dense models, Vulkan wins the interactive end: Qwen 3.5 9B prompt processing is 871.17 t/s on Vulkan at 512 tokens vs 708.58 on ROCm, and 27B is 252.68 vs 209.06.
  • The crossover shows up fast: at 32k context, 9B prompt processing flips to 593.8 t/s on ROCm vs 447.76 on Vulkan, and 27B flips to 176.69 vs 128.72.
  • Generation stays more Vulkan-friendly on dense models, so the ROCm win mostly comes from prompt processing amortizing the whole session rather than raw token-by-token speed.
  • The 122B run with 28 layers offloaded to CPU is where ROCm really earns its keep: at 32k, tg is 24.65 t/s on ROCm vs 18.41 on Vulkan, while pp is 153.16 vs 113.16.
  • Nightly ROCm is still a risk tradeoff: the reported llama-server prompt-cache OOM and earlier leak-like behavior make these results useful, but not production-safe.
// TAGS
llama-cppllmgpuinferencebenchmarkopen-source

DISCOVERED

20d ago

2026-03-22

PUBLISHED

20d ago

2026-03-22

RELEVANCE

8/ 10

AUTHOR

JaredsBored