OPEN_SOURCE ↗
REDDIT · REDDIT// 2d agoBENCHMARK RESULT
Gemma 4 26B posts strong R9700
This Reddit benchmark rerun shows Gemma 4 26B quantized GGUF running well on an AMD Radeon AI Pro R9700, with Vulkan hitting about 2,949 tok/s on prompt processing and 92.9 tok/s on generation. The author corrected an earlier batch-size mistake, so these numbers are closer to a fair default-config comparison.
// ANALYSIS
Local Gemma 4 inference on AMD looks backend-sensitive: on this card, Vulkan materially outpaced ROCm in prefill, while decode stayed strong but closer together. That makes the result useful less as a universal Gemma score and more as a signal that the runtime stack can dominate real-world throughput.
- –Vulkan beat ROCm on this setup by a wide margin in prompt processing: 2,949 vs 1,422 tok/s at `pp1000`, and 1,450 vs 681 tok/s at `pp1000 @ d50000`.
- –Generation speed was also higher under Vulkan, but by a smaller gap: 92.9 vs 70.9 tok/s at `tg2500`, narrowing to 78.2 vs 61.5 tok/s at the longest context.
- –The test was run on a 210W power cap with ROCm 7.2, so the result reflects both software maturity and power-policy constraints, not just raw GPU capability.
- –For people trying to run 26B-class open models locally, this is a reminder to benchmark the whole stack: driver, backend, quantization format, and batch settings all matter.
// TAGS
gemma-4llmbenchmarkgpuinferenceopen-weights
DISCOVERED
2d ago
2026-04-10
PUBLISHED
2d ago
2026-04-09
RELEVANCE
9/ 10
AUTHOR
ProfessionalSpend589