Hipfire hits 1,200 tok/s on AMD Strix Halo

// 90d agoOPENSOURCE RELEASE

Hipfire hits 1,200 tok/s on AMD Strix Halo

Hipfire, a Rust-native inference engine for AMD hardware, introduced an experimental MMQ path that boosts prefill speeds by over 3x on RDNA3 GPUs. Benchmarks on Strix Halo systems show throughput jumping to ~1,260 tok/s, matching performance of specialized implementations like llama.cpp.

// ANALYSIS

AMD APU users are seeing significant performance gains without waiting for official ROCm support. This update achieves up to a 3.8x speedup in prefill throughput by targeting the RDNA3/3.5 instruction set with i8 WMMA and tiled matrix-matrix kernels on Strix Halo hardware. The opt-in HIPFIRE_MMQ=1 toggle ensures stability while providing validated acceleration across multiple KV-cache modes.

// TAGS

llmamdrdna3inferencerustgpubenchmarkquantizationstrix-haloopen-source

DISCOVERED

90d ago

2026-04-28

PUBLISHED

90d ago

2026-04-28

RELEVANCE

8/ 10

AUTHOR

Own_Suspect5343

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

BENCHMARK1h ago

Benchmarks Challenge Claude Opus 5 Enterprise Performance

Anthropic's positioning of Claude Opus 5 as an everyday enterprise model is being challenged by independent benchmark evaluations. The tests evaluate Opus 5 against Fable 5 on key metrics essential for real-world deployment, sparking industry debate over actual production performance versus vendor claims.

LAUNCH1h ago

Ritual Launches Ritual Skills for Onchain AI Agents

Ritual has announced the launch of Ritual Skills, a resource providing modular, on-demand instruction sets and contract patterns for AI agents on the Ritual chain. While appearing on the surface as a standard developer tool, Ritual Skills architecturally demonstrates a critical paradigm shift: closing the gap between specifying desired outcomes in natural language and executing fully autonomous, verifiable onchain applications.

NEWS1h ago

FundaAI analyzes chip market overreaction to Kimi K3

This weekly semiconductor and tech market commentary by FundaAI highlights market volatility in the memory complex following sell-side bearishness tied to Kimi K3's KV cache architecture. The report further reviews pull-forward demand for ServiceNow into 2Q26, Google Cloud Platform's inflecting ROI on AI infrastructure investments, Infineon's positioning in AI power delivery, and tracking ARR across top AI research labs.