OPEN_SOURCE ↗
REDDIT · REDDIT// 3d agoTUTORIAL
Gemma 4 Integration Exposes Flutter Limits
A Memex developer tried to add Gemma 4 E2B/E4B on-device inference to a Flutter local-first PKM app and hit stability issues with third-party wrappers, multimodal edge cases, and unreliable structured output. The workaround was to call LiteRT-LM directly from Kotlin, serialize all requests, and add fallbacks for malformed JSON and hallucinated IDs.
// ANALYSIS
This reads less like a model demo and more like a field report on what it actually takes to ship on-device AI: Gemma 4 is useful, but the integration surface is still brittle enough that architecture matters as much as model quality.
- –Direct Kotlin access beat the Flutter wrapper once crashes and even device reboots showed up; abstraction layers are a liability when native inference gets unstable.
- –The single-Engine, one-Conversation-at-a-time constraint is the real concurrency bottleneck, so a global lock is not optional if multiple agents share the model.
- –Multimodal support is real but narrow in practice: JPEG/PNG only, WAV/PCM only, image-size limits, and thinking-mode conflicts all need guardrails.
- –Tool calling and structured output are still soft spots, so production systems need validation, retries, and ground-truth fallbacks for IDs and paths.
- –Thermal throttling turns sustained on-device inference into a systems problem, not just an ML one, which is the main reason this still feels early.
// TAGS
gemma-4llmmultimodalinferenceagentsdk
DISCOVERED
3d ago
2026-04-08
PUBLISHED
3d ago
2026-04-08
RELEVANCE
8/ 10
AUTHOR
SparkleMing