Swift engine runs Gemma 4 on iPhone

// 94d agoOPENSOURCE RELEASE

Swift engine runs Gemma 4 on iPhone

Swift-gemma4-core is a newly open-sourced Swift inference engine for running Gemma 4 natively on Apple Silicon and iOS, built after the author hit compatibility issues with existing MLX-based libraries. The project focuses on an offline, on-device experience and claims support for Gemma 4’s newer quirks, including partial rotary embeddings, cross-layer KV cache behavior, and prompt/template handling that previously broke decoding. The author says it already runs on a real iPhone with a relatively small memory footprint, but prefill latency is still high and they are asking the community to help optimize the bridge, tensor mapping, and allocations.

// ANALYSIS

The real story here is not “Gemma 4 on iPhone” so much as “someone filled a missing runtime gap for a model that existing Swift/MLX paths couldn’t handle cleanly.”

–Strong open-source signal: this is a practical infra contribution, not a demo wrapper.
–The technical pain points are plausible for a newer model family, but the post reads like an engineering progress update more than a polished launch.
–The biggest credibility hook is the claimed real-device runtime and low RAM usage; the biggest risk is the still-slow prefill path.
–Best fit audience: Swift/Metal/MLX people, offline AI app builders, and anyone trying to ship local-first Gemma on iOS.

// TAGS

swiftsiosapple-silicongemma-4local-llmon-device-aimetalmlxopen-source

DISCOVERED

94d ago

2026-04-09

PUBLISHED

94d ago

2026-04-09

RELEVANCE

8/ 10

AUTHOR

AgreeableNewspaper29

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS1h ago

Codex speed trumps reasoning for daily tasks

Tech commentator Riley Brown highlights that for 99% of routine tasks, AI models do not need to become smarter; instead, they need to run significantly faster. Running OpenAI Codex models like GPT-5.6 Sol at 5x speed on Cerebras' wafer-scale hardware demonstrates how ultra-low latency can eliminate cognitive bottlenecks.

VIDEO1h ago

Terrain Diffusion is an open-source framework that applies diffusion models to infinite procedural terrain generation, serving as a real-time, high-fidelity successor to Perlin noise.

Terrain Diffusion (also known as InfiniteDiffusion) is an open-source framework that bridges learned fidelity and procedural utility for open-world terrain generation. As a successor to traditional noise functions like Perlin noise, it achieves real-time interactive generation on consumer GPUs and has been integrated into a playable Minecraft mod, demonstrating its capability to construct infinite, geological worlds in real time.

NEWS2h ago

OpenAI, xAI, Meta drop major models

The AI model landscape saw unprecedented rapid shifts over a 96-hour period. OpenAI released the GPT-5.6 family to general availability, xAI took Grok 4.5 public following the SpaceX merger, and Meta introduced a new paid Model API, marking significant paradigm shifts across major AI players.