YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Gemma 4 E2B hits 20 tokens/sec on phone

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Gemma 4 E2B hits 20 tokens/sec on phone
OPEN LINK ↗
// 54d agoBENCHMARK RESULT

Gemma 4 E2B hits 20 tokens/sec on phone

A Reddit user says they ran Google’s newly launched Gemma 4 E2B fully offline on a phone and measured 20.3 tokens/sec on GPU inference. It is a small but useful real-world datapoint for Gemma 4’s edge positioning, suggesting the model is not just theoretically mobile-friendly but can feel practical on-device.

// ANALYSIS

Strong signal for on-device AI, but still an anecdotal benchmark from one setup.

  • The result supports Google’s claim that Gemma 4 E2B is designed for mobile-first, offline use.
  • 20.3 tok/s on a phone is a credible interactive speed, not just a lab curiosity.
  • The post does not specify device model, quantization, runtime, prompt length, or thermal conditions, so the number is not broadly comparable yet.
  • As a community datapoint, it matters more for feasibility than for leaderboard-style benchmarking.
// TAGS
gemmagemma-4e2bon-device-aimobile-aioffline-inferencebenchmarkgoogle-deepmind

DISCOVERED

54d ago

2026-04-03

PUBLISHED

54d ago

2026-04-03

RELEVANCE

8/ 10

AUTHOR

EthanJohnson01