Hipfire hits 1,200 tok/s on AMD Strix Halo
Hipfire, a Rust-native inference engine for AMD hardware, introduced an experimental MMQ path that boosts prefill speeds by over 3x on RDNA3 GPUs. Benchmarks on Strix Halo systems show throughput jumping to ~1,260 tok/s, matching performance of specialized implementations like llama.cpp.
AMD APU users are seeing significant performance gains without waiting for official ROCm support. This update achieves up to a 3.8x speedup in prefill throughput by targeting the RDNA3/3.5 instruction set with i8 WMMA and tiled matrix-matrix kernels on Strix Halo hardware. The opt-in HIPFIRE_MMQ=1 toggle ensures stability while providing validated acceleration across multiple KV-cache modes.
DISCOVERED
1h ago
2026-04-28
PUBLISHED
3h ago
2026-04-28
RELEVANCE
AUTHOR
Own_Suspect5343