Gemma 4 E2B hits 20 tokens/sec on phone
A Reddit user says they ran Google’s newly launched Gemma 4 E2B fully offline on a phone and measured 20.3 tokens/sec on GPU inference. It is a small but useful real-world datapoint for Gemma 4’s edge positioning, suggesting the model is not just theoretically mobile-friendly but can feel practical on-device.
Strong signal for on-device AI, but still an anecdotal benchmark from one setup.
- –The result supports Google’s claim that Gemma 4 E2B is designed for mobile-first, offline use.
- –20.3 tok/s on a phone is a credible interactive speed, not just a lab curiosity.
- –The post does not specify device model, quantization, runtime, prompt length, or thermal conditions, so the number is not broadly comparable yet.
- –As a community datapoint, it matters more for feasibility than for leaderboard-style benchmarking.
DISCOVERED
54d ago
2026-04-03
PUBLISHED
54d ago
2026-04-03
RELEVANCE
AUTHOR
EthanJohnson01