Ministral 3 3B seeks easier Android path
A Computer Engineering student wants to turn document or receipt photos into structured JSON using an on-device multimodal model in a native Android/Kotlin app. The hard part is finding a Kotlin-friendly path that avoids custom JNI/C++ glue while keeping memory use low enough for a standard phone.
This is absolutely doable, but the model choice is not the main risk; the real risk is integration complexity. If the goal is a reliable final project within 300 hours, the safest path is probably an OCR-first pipeline, with VLM only if the SDK already handles image ingestion and packaging cleanly.
- –Kotlin-first Android options are getting better: modern wrappers now advertise on-device VLM support, image-file inputs, and model registration without hand-rolled JNI.
- –Context shrinkage helps KV-cache RAM, but it does not erase the cost of image encoding, projector layers, or model packaging on mobile.
- –MLC-style runtimes do expose context-window and prefill limits, which is good for memory control, but the compile/package workflow is still a real tax.
- –For ticket and document extraction, OCR plus a compact local LLM is usually the most dependable demo path and easier to benchmark in a report.
- –If you want the VLM route, start with a tiny multimodal model and treat the app as a pipeline prototype, not a general-purpose assistant.
DISCOVERED
78d ago
2026-03-23
PUBLISHED
78d ago
2026-03-23
RELEVANCE
AUTHOR
Due-Savings-670