Gemini Robotics data recipe stays opaque
Google’s model card says Gemini Robotics-ER 1.6 is trained on Gemini 3.0 datasets plus additional embodied-reasoning datasets, but it does not disclose a clean public recipe for whether those are real robot trajectories, simulation runs, demonstrations, or labeled perception corpora. The clearest public clue is that the model is a VLM for high-level robot reasoning, not a low-level VLA controller, with inputs like text, images, audio, and video and text outputs for planning, pointing, success detection, and instrument reading.
The important answer is “not enough is public,” and that opacity matters because robotics progress is increasingly bottlenecked by data provenance, embodiment coverage, and evaluation setup rather than model branding alone.
- –Google confirms extra embodied-reasoning data, but not the exact schema, collection pipeline, or mixture weights.
- –ER 1.6 likely emphasizes annotated visual/spatial tasks, multi-view perception, safety scenarios, success detection, and instrument-reading examples more than raw motor-command trajectories.
- –For action execution, Google’s own architecture separates ER reasoning from VLA/action models, so ER 1.6 should not be treated as a policy trained only on robot interaction logs.
- –The “submit 10-50 labeled images” feedback path hints that targeted labeled failure cases are part of the improvement loop.
- –Developers evaluating it should ask for task-specific evals and deployment constraints, not assume broad real-world robot competence from the training-data label alone.
DISCOVERED
4h ago
2026-04-23
PUBLISHED
7h ago
2026-04-23
RELEVANCE
AUTHOR
siri_1110