BACK_TO_FEEDAICRIER_2
Gemma 4 E2B hits 20 tokens/sec on phone
OPEN_SOURCE ↗
REDDIT · REDDIT// 9d agoBENCHMARK RESULT

Gemma 4 E2B hits 20 tokens/sec on phone

A Reddit user says they ran Google’s newly launched Gemma 4 E2B fully offline on a phone and measured 20.3 tokens/sec on GPU inference. It is a small but useful real-world datapoint for Gemma 4’s edge positioning, suggesting the model is not just theoretically mobile-friendly but can feel practical on-device.

// ANALYSIS

Strong signal for on-device AI, but still an anecdotal benchmark from one setup.

  • The result supports Google’s claim that Gemma 4 E2B is designed for mobile-first, offline use.
  • 20.3 tok/s on a phone is a credible interactive speed, not just a lab curiosity.
  • The post does not specify device model, quantization, runtime, prompt length, or thermal conditions, so the number is not broadly comparable yet.
  • As a community datapoint, it matters more for feasibility than for leaderboard-style benchmarking.
// TAGS
gemmagemma-4e2bon-device-aimobile-aioffline-inferencebenchmarkgoogle-deepmind

DISCOVERED

9d ago

2026-04-03

PUBLISHED

9d ago

2026-04-03

RELEVANCE

8/ 10

AUTHOR

EthanJohnson01