BACK_TO_FEEDAICRIER_2
iOS App Runs Gemma 4 Fully On-Device
OPEN_SOURCE ↗
REDDIT · REDDIT// 2h agoPRODUCT LAUNCH

iOS App Runs Gemma 4 Fully On-Device

A developer shipped an iPhone app that rewrites oral transcripts into polished paragraphs using Gemma 4 E2B entirely on-device. The post is also a production report on MLX Swift and MLXLLM, covering model selection, custom architecture wiring, and iOS lifecycle pitfalls.

// ANALYSIS

This is the kind of local-AI launch that matters: the real work is not “can the model run,” but “can it survive iOS constraints, memory ceilings, and backgrounding in production.”

  • E2B looks like the practical sweet spot here: E4B exceeded memory limits, while Qwen3.5-4B brought unwanted thinking-token behavior for pure generation
  • The custom Gemma 4 registration and prompt formatting in MLXLLM suggests ecosystem support is still immature for newer architectures
  • The 128K context window matters less than the app’s constrained use case; short, bounded rewrite jobs are a better fit than trying to stuff the whole app into context
  • The `.scenePhase` gate is the most production-real detail in the post: mobile inference success depends on app lifecycle discipline as much as model quality
  • Offline transcript rewriting is a strong on-device use case because privacy, latency, and cost all align with the product value proposition
// TAGS
gemma-4llmedge-aiinferencesdk

DISCOVERED

2h ago

2026-04-16

PUBLISHED

3h ago

2026-04-16

RELEVANCE

8/ 10

AUTHOR

Ok-Taste3787