llama.cpp Vulkan tops ROCm on RDNA3

// 124d agoBENCHMARK RESULT

llama.cpp Vulkan tops ROCm on RDNA3

A LocalLLaMA benchmark post claims recent llama.cpp build b8262 has flipped the usual AMD pecking order, with Vulkan beating ROCm in prompt processing and in some token-generation tests on RDNA3 cards like the RX 7900 XTX and Radeon Pro W7800. If those results hold across more setups, Vulkan is no longer just the fallback backend for AMD local inference on Linux.

// ANALYSIS

This is less a definitive backend victory lap than a reminder that AMD inference performance in llama.cpp is now highly sensitive to backend, driver, and build details. The bigger story is that Vulkan has moved from “good enough” to “worth testing first” on some modern Radeon setups.

–The posted results show Vulkan clearly ahead on pp512 for Qwen 3.5 and GLM-4.7 Flash, with especially large gains on the W7800 run
–ROCm still wins one split-GPU tg128 case for Qwen 3.5, which suggests backend choice may now depend on model architecture and multi-GPU sharding strategy rather than a simple universal ranking
–On gpt-oss-20b, Vulkan also beats ROCm on tg128, which is notable because token generation has often been ROCm’s stronger side on AMD
–Recent llama.cpp community discussion already showed Vulkan outperforming ROCm on some 7900 XTX systems, so this post looks like part of a broader pattern rather than a one-off anomaly
–Developers should treat this as a tuning signal, not a law of nature: Mesa/RADV changes, ROCm compiler quirks, coopmat support, and exact llama.cpp commits can all swing the result hard

// TAGS

llama-cppllminferencegpubenchmarkopen-source

DISCOVERED

124d ago

2026-03-10

PUBLISHED

125d ago

2026-03-09

RELEVANCE

8/ 10

AUTHOR

XccesSv2

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE11m ago

OpenAI restores ChatGPT on WhatsApp in EEA

OpenAI has restored ChatGPT access on WhatsApp for users in the European Economic Area (EEA) via a verified contact number. Users can interact with the AI assistant in multiple languages, send voice notes, upload images, and generate new media directly within the chat.

BENCHMARK45m ago

Grok 4.5 tops SWE-Atlas-QnA benchmark

xAI's frontier AI model, Grok 4.5, has achieved the top ranking on Scale AI's SWE-Atlas-QnA benchmark. While individual benchmark supremacy is often short-lived, the result highlights the rapid, iterative pace of top-tier AI models pushing each other forward in complex, codebase-level question answering and developer agent capabilities.

OPEN SOURCE1h ago

Win11Debloat declutters Windows 10 and 11

Win11Debloat is a lightweight, customizable PowerShell script to declutter, optimize, and customize Windows 10 and 11. It allows users to remove pre-installed bloatware apps, disable telemetry, adjust privacy settings, and tweak user interface elements through an interactive menu or command-line arguments.