PhysisForcing injects physics into video simulation
Generative video models like DiTs are increasingly used as embodied world simulators but often suffer from physically implausible artifacts such as discontinuous trajectories and object deformation. PhysisForcing addresses these issues by focusing supervision on physics-informative regions using trajectory and relational alignment losses, significantly improving physical consistency and closed-loop robotic planning success.
PhysisForcing tackles a critical bottleneck in deploying visual generative models for robotics by forcing the models to adhere more strictly to physical realities rather than just visual plausibility.
- –Addresses the gap between purely visual video generation and the physical accuracy required for robotic training.
- –Leverages a frozen video understanding encoder to extract inter-region relations, improving contact and interaction dynamics.
- –Demonstrates tangible downstream benefits, increasing closed-loop success rates from 16% to 24% when used as a world model.
DISCOVERED
1h ago
2026-06-29
PUBLISHED
2h ago
2026-06-29
RELEVANCE
AUTHOR
_akhaliq