Swift engine runs Gemma 4 on iPhone
Swift-gemma4-core is a newly open-sourced Swift inference engine for running Gemma 4 natively on Apple Silicon and iOS, built after the author hit compatibility issues with existing MLX-based libraries. The project focuses on an offline, on-device experience and claims support for Gemma 4’s newer quirks, including partial rotary embeddings, cross-layer KV cache behavior, and prompt/template handling that previously broke decoding. The author says it already runs on a real iPhone with a relatively small memory footprint, but prefill latency is still high and they are asking the community to help optimize the bridge, tensor mapping, and allocations.
The real story here is not “Gemma 4 on iPhone” so much as “someone filled a missing runtime gap for a model that existing Swift/MLX paths couldn’t handle cleanly.”
- –Strong open-source signal: this is a practical infra contribution, not a demo wrapper.
- –The technical pain points are plausible for a newer model family, but the post reads like an engineering progress update more than a polished launch.
- –The biggest credibility hook is the claimed real-device runtime and low RAM usage; the biggest risk is the still-slow prefill path.
- –Best fit audience: Swift/Metal/MLX people, offline AI app builders, and anyone trying to ship local-first Gemma on iOS.
DISCOVERED
3d ago
2026-04-09
PUBLISHED
3d ago
2026-04-09
RELEVANCE
AUTHOR
AgreeableNewspaper29