llama.cpp Mi50 benchmarks pit ROCm, Vulkan

// 112d agoBENCHMARK RESULT

llama.cpp Mi50 benchmarks pit ROCm, Vulkan

On a Mi50 32GB, llama.cpp's TheRock ROCm 7.13 nightly and Vulkan backend split wins by workload. Vulkan is quicker for short-context dense-model chat, while ROCm takes over once context stretches or MoE and CPU-offload paths enter the mix.

// ANALYSIS

This looks less like a universal backend verdict than a reminder that local inference performance is shaped by context length and model topology.

–For dense models, Vulkan wins the interactive end: Qwen 3.5 9B prompt processing is 871.17 t/s on Vulkan at 512 tokens vs 708.58 on ROCm, and 27B is 252.68 vs 209.06.
–The crossover shows up fast: at 32k context, 9B prompt processing flips to 593.8 t/s on ROCm vs 447.76 on Vulkan, and 27B flips to 176.69 vs 128.72.
–Generation stays more Vulkan-friendly on dense models, so the ROCm win mostly comes from prompt processing amortizing the whole session rather than raw token-by-token speed.
–The 122B run with 28 layers offloaded to CPU is where ROCm really earns its keep: at 32k, tg is 24.65 t/s on ROCm vs 18.41 on Vulkan, while pp is 153.16 vs 113.16.
–Nightly ROCm is still a risk tradeoff: the reported llama-server prompt-cache OOM and earlier leak-like behavior make these results useful, but not production-safe.

// TAGS

llama-cppllmgpuinferencebenchmarkopen-source

DISCOVERED

112d ago

2026-03-22

PUBLISHED

112d ago

2026-03-22

RELEVANCE

8/ 10

AUTHOR

JaredsBored

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE1h ago

OpenDesign integrates Meta Muse Spark API

OpenDesign is an open-source, local-first design workspace that can be paired with Meta's Muse Spark to generate code-ready prototypes and UI screens directly from screenshots and prompts. This integration bridges the gap between visual design and software development, providing developers with an interactive workspace to rapidly iterate on AI-generated user interfaces.

UPDATE1h ago

T3 Code updates agent GUI with git worktrees

T3 Code has updated its local-first GUI for orchestrating AI coding agents, adding multi-provider key and subscription management. The release also introduces native support for git worktrees, custom automation actions, and side-by-side split diffs to safely run multiple agent workflows in parallel.

UPDATE2h ago

Grok Build adds multiline input, scrolling

SpaceXAI has released Grok Build versions 0.2.99 and 0.2.98, introducing multiline input and terminal scrolling for its terminal-based AI coding assistant. The updates allow users to input complex prompts directly on the dashboard and scroll through chat histories using PageUp and PageDown.