Physical Intelligence adds dual-memory stack to VLAs
Physical Intelligence published its MEM architecture and paper, adding short-term video memory plus long-term language memory to π0.6 vision-language-action models. The update targets long-horizon robotic tasks, claiming memory windows up to 15 minutes while keeping inference latency practical.
This is a meaningful step from flashy robot demos toward systems that can actually track multi-stage work over time.
- –MEM splits memory into two channels: dense recent observations and compressed textual task history.
- –The approach is designed to reduce token load, which matters for real-time robotic control loops.
- –The blog and paper frame gains around long-horizon kitchen-style tasks and in-context adaptation after failed attempts.
- –If these results transfer broadly, VLA progress may hinge more on memory design than on scaling raw policy size.
DISCOVERED
83d ago
2026-03-05
PUBLISHED
84d ago
2026-03-04
RELEVANCE
AUTHOR
Worldly_Evidence9113