OPEN_SOURCE ↗
REDDIT · REDDIT// 2h agoPRODUCT LAUNCH
iOS App Runs Gemma 4 Fully On-Device
A developer shipped an iPhone app that rewrites oral transcripts into polished paragraphs using Gemma 4 E2B entirely on-device. The post is also a production report on MLX Swift and MLXLLM, covering model selection, custom architecture wiring, and iOS lifecycle pitfalls.
// ANALYSIS
This is the kind of local-AI launch that matters: the real work is not “can the model run,” but “can it survive iOS constraints, memory ceilings, and backgrounding in production.”
- –E2B looks like the practical sweet spot here: E4B exceeded memory limits, while Qwen3.5-4B brought unwanted thinking-token behavior for pure generation
- –The custom Gemma 4 registration and prompt formatting in MLXLLM suggests ecosystem support is still immature for newer architectures
- –The 128K context window matters less than the app’s constrained use case; short, bounded rewrite jobs are a better fit than trying to stuff the whole app into context
- –The `.scenePhase` gate is the most production-real detail in the post: mobile inference success depends on app lifecycle discipline as much as model quality
- –Offline transcript rewriting is a strong on-device use case because privacy, latency, and cost all align with the product value proposition
// TAGS
gemma-4llmedge-aiinferencesdk
DISCOVERED
2h ago
2026-04-16
PUBLISHED
3h ago
2026-04-16
RELEVANCE
8/ 10
AUTHOR
Ok-Taste3787