OPEN_SOURCE ↗
REDDIT · REDDIT// 8d agoRESEARCH PAPER
VOID removes objects, fixes scene interactions
VOID is a Netflix-backed video object removal model that tries to erase not just an object’s pixels, but the physical effects it causes in the scene. It uses counterfactual training data, VLM-guided affected-region masks, and a two-pass refinement pipeline to keep motion temporally and physically plausible.
// ANALYSIS
This is a meaningful step beyond standard video inpainting: the hard problem is causality, and VOID is explicitly modeling that instead of hoping a fill-in model will guess right.
- –Counterfactual paired data from Kubric and HUMOTO gives the model supervision for downstream effects, which is the right signal for collisions, falls, and chain reactions.
- –VLM-guided masks are a practical way to expand the edit beyond the clicked object to everything the object influences.
- –The second-pass flow-warped refinement reads like a targeted fix for the most obvious failure mode in video diffusion: shape drift and morphing over time.
- –A 64.8% human preference win over Runway, ProPainter, and Gen-Omnimatte suggests this is not just a clever paper idea; it is already competitive in messy real-world edits.
- –The strongest near-term use cases are likely robotics simulation, ADAS scenario editing, and dataset generation where preserving physics matters more than perfect texture fill.
// TAGS
voidvideo-genresearchmultimodalopen-source
DISCOVERED
8d ago
2026-04-03
PUBLISHED
9d ago
2026-04-03
RELEVANCE
9/ 10
AUTHOR
Least_Light6037