OPEN_SOURCE ↗
YT · YOUTUBE// 37d agoRESEARCH PAPER
LaST-VLA tops NAVSIM with latent reasoning
LaST-VLA is a new Xiaomi EV and Tsinghua research framework for autonomous driving that replaces explicit textual chain-of-thought with physically grounded latent spatio-temporal reasoning. The paper reports state-of-the-art results on NAVSIM v1 and v2, plus strong gains on SURDS and NuDynamics for spatial and motion reasoning.
// ANALYSIS
This is a strong signal that autonomous driving VLA work is moving past “explain in text, then act” toward latent planning that is closer to the physics of the scene. The interesting part is not just the benchmark bump, but the claim that better grounding cuts both hallucinations and inference overhead.
- –LaST-VLA distills geometric priors from VGGT and dynamic foresight from the Cosmos world model into a latent reasoning space instead of forcing the model to verbalize every intermediate step
- –The paper claims 91.3 PDMS on NAVSIM v1 and 87.1 EPDMS on NAVSIM v2, beating prior vision-only baselines and suggesting latent supervision is materially helping planning quality
- –Gains on SURDS and NuDynamics matter because they point to better 3D spatial understanding and motion-state reasoning, not just benchmark overfitting on a single driving stack
- –The training recipe is notable for AI researchers: progressive SFT followed by GRPO reinforcement learning, with explicit safety and rule-compliance objectives baked into the refinement stage
// TAGS
last-vlaroboticsmultimodalreasoningresearchsafety
DISCOVERED
37d ago
2026-03-05
PUBLISHED
37d ago
2026-03-05
RELEVANCE
8/ 10
AUTHOR
Discover AI