RX 580 Vulkan hits 16 t/s ceiling on llama.cpp

// 76d agoINFRASTRUCTURE

RX 580 Vulkan hits 16 t/s ceiling on llama.cpp

A LocalLLaMA user running llama.cpp with the Vulkan backend on an AMD RX 580 (Polaris, gfx803) reports a hard performance ceiling of ~16 t/s on Qwen3.5-4B Q4_K_M, despite all GPU layers offloaded and ample VRAM headroom. The bottleneck traces back to Polaris lacking hardware matrix acceleration in RADV, forcing all matmul ops through generic fp32 shaders.

// ANALYSIS

The RX 580 Vulkan experiment exposes a real gap: theoretical memory bandwidth (256 GB/s) vs. actual utilization (~15%), revealing how critical hardware matrix ops are for LLM inference throughput.

–Polaris (gfx803) has no fp16, bf16, or int dot product acceleration in RADV — every matrix multiply runs as a generic fp32 compute shader, which is massively inefficient for transformer attention patterns
–The gap between theoretical ~100 t/s (bandwidth-bound) and actual ~16 t/s is the real cost of missing tensor core equivalents on older AMD hardware
–ROCm with HIP (DGGML_HIPBLAS=ON targeting gfx803) is the realistic path forward — Vulkan lacks the low-level primitives to close this gap on Polaris
–llama.cpp's Vulkan backend is solid for supported hardware but cannot compensate for missing ISA features; no amount of flag tuning helps
–This is a useful data point for anyone evaluating old AMD GPUs for local inference — Vulkan is not a universal fallback that extracts full hardware performance

// TAGS

llama.cppinferencegpuopen-sourceedge-ai

DISCOVERED

76d ago

2026-03-14

PUBLISHED

76d ago

2026-03-14

RELEVANCE

5/ 10

AUTHOR

Numerous_Sandwich_62

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL1d ago

Anthropic drops Opus 4.8, teases upcoming Mythos model

Anthropic launched Claude Opus 4.8 with adjustable effort controls, dynamic workflows for Claude Code, and a cheaper fast mode. The release serves as a precursor to their highly anticipated Claude Mythos model, which is slated to roll out in the coming weeks.

VIDEO1d ago

Viral video teases Claude Opus 4.8

A viral video directed by Miguel07Code showcases impressive "hyperframes" camera movements, allegedly generated by Claude Opus 4.8. The post has sparked speculation about Claude's video generation capabilities.

LAUNCH1d ago

Browser Use Terminal launches Rust web-agent TUI

Browser Use Terminal is a new Rust-based TUI that lets developers automate and steer browser tasks directly from the command line. It combines a lightweight LLM harness with direct CDP control over Chrome for highly observable, interactive automation.