BACK_TO_FEEDAICRIER_2
Gemma 4 Integration Exposes Flutter Limits
OPEN_SOURCE ↗
REDDIT · REDDIT// 3d agoTUTORIAL

Gemma 4 Integration Exposes Flutter Limits

A Memex developer tried to add Gemma 4 E2B/E4B on-device inference to a Flutter local-first PKM app and hit stability issues with third-party wrappers, multimodal edge cases, and unreliable structured output. The workaround was to call LiteRT-LM directly from Kotlin, serialize all requests, and add fallbacks for malformed JSON and hallucinated IDs.

// ANALYSIS

This reads less like a model demo and more like a field report on what it actually takes to ship on-device AI: Gemma 4 is useful, but the integration surface is still brittle enough that architecture matters as much as model quality.

  • Direct Kotlin access beat the Flutter wrapper once crashes and even device reboots showed up; abstraction layers are a liability when native inference gets unstable.
  • The single-Engine, one-Conversation-at-a-time constraint is the real concurrency bottleneck, so a global lock is not optional if multiple agents share the model.
  • Multimodal support is real but narrow in practice: JPEG/PNG only, WAV/PCM only, image-size limits, and thinking-mode conflicts all need guardrails.
  • Tool calling and structured output are still soft spots, so production systems need validation, retries, and ground-truth fallbacks for IDs and paths.
  • Thermal throttling turns sustained on-device inference into a systems problem, not just an ML one, which is the main reason this still feels early.
// TAGS
gemma-4llmmultimodalinferenceagentsdk

DISCOVERED

3d ago

2026-04-08

PUBLISHED

3d ago

2026-04-08

RELEVANCE

8/ 10

AUTHOR

SparkleMing