OPEN_SOURCE ↗
REDDIT · REDDIT// 19d agoINFRASTRUCTURE
Ministral 3 3B seeks easier Android path
A Computer Engineering student wants to turn document or receipt photos into structured JSON using an on-device multimodal model in a native Android/Kotlin app. The hard part is finding a Kotlin-friendly path that avoids custom JNI/C++ glue while keeping memory use low enough for a standard phone.
// ANALYSIS
This is absolutely doable, but the model choice is not the main risk; the real risk is integration complexity. If the goal is a reliable final project within 300 hours, the safest path is probably an OCR-first pipeline, with VLM only if the SDK already handles image ingestion and packaging cleanly.
- –Kotlin-first Android options are getting better: modern wrappers now advertise on-device VLM support, image-file inputs, and model registration without hand-rolled JNI.
- –Context shrinkage helps KV-cache RAM, but it does not erase the cost of image encoding, projector layers, or model packaging on mobile.
- –MLC-style runtimes do expose context-window and prefill limits, which is good for memory control, but the compile/package workflow is still a real tax.
- –For ticket and document extraction, OCR plus a compact local LLM is usually the most dependable demo path and easier to benchmark in a report.
- –If you want the VLM route, start with a tiny multimodal model and treat the app as a pipeline prototype, not a general-purpose assistant.
// TAGS
ministral-3-3bmultimodalllminferencesdkedge-aiopen-source
DISCOVERED
19d ago
2026-03-23
PUBLISHED
19d ago
2026-03-23
RELEVANCE
7/ 10
AUTHOR
Due-Savings-670