OPEN_SOURCE ↗
REDDIT · REDDIT// 9d agoBENCHMARK RESULT
Gemma 4 E2B hits 20 tokens/sec on phone
A Reddit user says they ran Google’s newly launched Gemma 4 E2B fully offline on a phone and measured 20.3 tokens/sec on GPU inference. It is a small but useful real-world datapoint for Gemma 4’s edge positioning, suggesting the model is not just theoretically mobile-friendly but can feel practical on-device.
// ANALYSIS
Strong signal for on-device AI, but still an anecdotal benchmark from one setup.
- –The result supports Google’s claim that Gemma 4 E2B is designed for mobile-first, offline use.
- –20.3 tok/s on a phone is a credible interactive speed, not just a lab curiosity.
- –The post does not specify device model, quantization, runtime, prompt length, or thermal conditions, so the number is not broadly comparable yet.
- –As a community datapoint, it matters more for feasibility than for leaderboard-style benchmarking.
// TAGS
gemmagemma-4e2bon-device-aimobile-aioffline-inferencebenchmarkgoogle-deepmind
DISCOVERED
9d ago
2026-04-03
PUBLISHED
9d ago
2026-04-03
RELEVANCE
8/ 10
AUTHOR
EthanJohnson01