llama.cpp build b8464 hits 10k tokens/sec on R9700

// 67d agoPRODUCT UPDATE

llama.cpp build b8464 hits 10k tokens/sec on R9700

A major update to llama.cpp (b8464) pushes prompt processing speeds to 10,907 tokens per second on AMD’s Radeon AI PRO R9700. The RDNA4-optimized build triples throughput for Qwen 3.5, making instant 128k context evaluation a reality for local developers.

// ANALYSIS

The R9700's leap from 4k to 10k tokens per second makes advanced techniques like Multi-Token Prediction and speculative decoding virtually instantaneous on consumer hardware. Build b8464 introduces fused Gated Delta Network kernels for Qwen 3.5, reducing graph splits and keeping the entire computation on the GPU's high-bandwidth memory. Flash Attention is the primary driver here, allowing developers to maintain this throughput even at 128k context windows. This performance tier effectively eliminates cold-start latency for RAG applications, as large context blocks can now be pre-processed in milliseconds. For AI developers, the R9700 is emerging as a cost-effective alternative to datacenter silicon for iterative model prototyping and high-throughput agent loops.

// TAGS

llama-cppgpurdna4qwen-3.5inferenceopen-source

DISCOVERED

67d ago

2026-03-22

PUBLISHED

67d ago

2026-03-22

RELEVANCE

8/ 10

AUTHOR

greenail

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE18m ago

Claude Code adds automated fixes, persistent model defaults

Claude Code v2.1.153 introduces `/code-review --fix` to automatically apply suggested improvements and persists model selections as defaults. The update also ships critical security patches for OAuth credentials and resolves major memory leaks for long-running sessions.

NEWS38m ago

Midjourney founder: diffusion wins as FLOPS outpace memory

David Holz argues that diffusion models are the superior long-term architecture because they scale with cheap compute (FLOPS) while autoregressive models remain bottlenecked by expensive memory bandwidth.

UPDATE40m ago

MotionSites prompts enable premium AI-generated landing pages

MotionSites provides a curated library of high-fidelity design prompts for AI web builders like Lovable and Bolt.new. Its "Reverie" template showcases immersive 3D motion and interactive layouts designed for premium SaaS and exhibition sites.