BACK_TO_FEEDAICRIER_2
Gemma 4 CPU Throughput Hits 15 t/s
OPEN_SOURCE ↗
REDDIT · REDDIT// 4d agoBENCHMARK RESULT

Gemma 4 CPU Throughput Hits 15 t/s

A LocalLLaMA thread asks how fast Gemma 4 runs on CPU, and the clearest numbers shared are about 8-11 tokens/sec on a Ryzen 5 3600 with a q4_k_m quant. The fastest reported figure in-thread is roughly 15 tokens/sec, but that setup used speculative decoding and was not CPU-only.

// ANALYSIS

The takeaway is that Gemma 4 looks usable on CPU, but the real speed unlock in this thread comes from speculative decoding, not just picking a smaller quant.

  • The only concrete CPU-only result shared is 8-11 tok/s on a Ryzen 5 3600, 64 GB DDR4, using the official q4_k_m quant in llama.cpp server.
  • The top number, about 15 tok/s, came from a 128 GB Strix Halo system using speculative decoding with a draft model, so it is not a clean CPU-only baseline.
  • The thread does not establish a clear “best” quant for both speed and quality, but q4_k_m looks like the practical starting point people are reaching for.
  • For local Gemma 4 inference, memory bandwidth and decoding strategy appear to matter as much as raw CPU cores, which is usually the real bottleneck on consumer hardware.
// TAGS
gemma-4llmbenchmarkinferenceopen-source

DISCOVERED

4d ago

2026-04-08

PUBLISHED

4d ago

2026-04-08

RELEVANCE

9/ 10

AUTHOR

last_llm_standing