MultiWorld drops multi-agent, multi-view video world model
MultiWorld is a scalable framework for generating coherent video environments with multiple interacting agents and synchronized camera views. It enables precise control and spatial consistency for complex scenarios like multi-player gaming and robotic manipulation.
MultiWorld solves the "identity crisis" in multi-agent video generation, moving from simple scene synthesis to functional, consistent world modeling.
- –Agent Identity Embedding (AIE) uses RoPE to uniquely identify and control multiple agents simultaneously without ambiguity
- –Global State Encoder ensures 3D-aware spatial consistency across variable viewpoints via cross-attention
- –1.5x speedup from parallel view generation makes high-fidelity world modeling more computationally feasible
- –Success on high-motion datasets like It Takes Two demonstrates a new benchmark for generative video coherence
DISCOVERED
45d ago
2026-04-26
PUBLISHED
45d ago
2026-04-26
RELEVANCE
AUTHOR
AI Search