OPEN_SOURCE ↗
YT · YOUTUBE// 34d agoRESEARCH PAPER
ZipMap zips 3D reconstruction to linear time
ZipMap is a CVPR 2026 paper from Google DeepMind, Cornell, and MIT that uses test-time training to compress long image sequences into a compact scene state for linear-time 3D reconstruction. The model reconstructs 750 frames in under 10 seconds on one H100 while matching or beating quadratic baselines like VGGT and π³ across several pose, depth, and point-map benchmarks.
// ANALYSIS
ZipMap is interesting because it does more than speed up a benchmark: it reframes large-scene reconstruction as stateful memory compression instead of ever-growing global attention. That makes it one of the clearest signs that test-time training can be useful outside language modeling.
- –The core trick is swapping quadratic global attention for local window attention plus large-chunk test-time training layers, pushing runtime from O(N²) to O(N)
- –Unlike many efficient vision papers, it does not just trade quality for speed; the reported results stay competitive with top quadratic systems on camera pose, depth, and dense geometry
- –The compact scene state is queryable in real time, which gives the model a practical path from offline reconstruction to interactive and streaming use cases
- –The paper still flags real limits: very long out-of-distribution scenes degrade quality, and queried RGB views remain blurry in high-frequency regions
// TAGS
zipmapresearchinferencebenchmark
DISCOVERED
34d ago
2026-03-08
PUBLISHED
34d ago
2026-03-08
RELEVANCE
8/ 10
AUTHOR
Discover AI