RX 9070 XT trails MI50 in llama.cpp
A Reddit user benchmarked llama.cpp on an RX 9070 XT under ROCm 7.2.3 and found it only matched an older MI50 on generation speed, despite the newer card’s better prompt throughput. The comparison is noisy because the test used different quants and different VM hosts, but it still raises questions about AMD ROCm performance on RDNA 4 for local LLMs.
The hot take is that this looks less like a raw GPU disappointment and more like a memory-bandwidth-and-tuning story: old datacenter HBM can still hang with newer gaming silicon on LLM workloads.
- –The comparison is not apples-to-apples: Q3_K_M on the 9070 XT versus Q6_K on the MI50, plus different VM setups and CPUs, makes direct token/s conclusions shaky.
- –The MI50’s HBM bandwidth is a major advantage for generation-heavy workloads, which can offset its age versus the RX 9070 XT’s GDDR6 setup.
- –The RX 9070 XT does show stronger prompt processing in the posted numbers, so the card is not universally slow; the bottleneck is likely workload mix and memory behavior.
- –ROCm on RDNA 4 is still young enough that driver/runtime tuning can swing results materially, especially in llama.cpp with spec decoding and flash attention enabled.
- –For buyers optimizing for local AI rather than gaming, the result argues for careful benchmark testing before assuming any newer Radeon will beat an older Instinct card.
DISCOVERED
2h ago
2026-05-26
PUBLISHED
6h ago
2026-05-26
RELEVANCE
AUTHOR
WhatererBlah555