OPEN_SOURCE ↗
REDDIT · REDDIT// 4d agoBENCHMARK RESULT
Gemma 4 CPU Throughput Hits 15 t/s
A LocalLLaMA thread asks how fast Gemma 4 runs on CPU, and the clearest numbers shared are about 8-11 tokens/sec on a Ryzen 5 3600 with a q4_k_m quant. The fastest reported figure in-thread is roughly 15 tokens/sec, but that setup used speculative decoding and was not CPU-only.
// ANALYSIS
The takeaway is that Gemma 4 looks usable on CPU, but the real speed unlock in this thread comes from speculative decoding, not just picking a smaller quant.
- –The only concrete CPU-only result shared is 8-11 tok/s on a Ryzen 5 3600, 64 GB DDR4, using the official q4_k_m quant in llama.cpp server.
- –The top number, about 15 tok/s, came from a 128 GB Strix Halo system using speculative decoding with a draft model, so it is not a clean CPU-only baseline.
- –The thread does not establish a clear “best” quant for both speed and quality, but q4_k_m looks like the practical starting point people are reaching for.
- –For local Gemma 4 inference, memory bandwidth and decoding strategy appear to matter as much as raw CPU cores, which is usually the real bottleneck on consumer hardware.
// TAGS
gemma-4llmbenchmarkinferenceopen-source
DISCOVERED
4d ago
2026-04-08
PUBLISHED
4d ago
2026-04-08
RELEVANCE
9/ 10
AUTHOR
last_llm_standing