DenseVault dedupes training checkpoints over WebDAV
DenseVault is a single-file, zero-dependency Python write-once-read-many archive that uses content-defined chunking, delta encoding, and entropy-aware compression to store versioned files efficiently over WebDAV. The maker built it for AI training checkpoints and other large binaries, and says one checkpoint set shrank from 9.1 GB to 5.1 GB.
This feels like a genuinely useful MLOps storage layer, not just a compression demo: the big win is keeping checkpoint sprawl mounted and reusable instead of turning it into dead cold storage. Its sweet spot is versioned, partially redundant artifacts; once data is already compressed or needs random access, the gains narrow fast.
- –The 9.1 GB to 5.1 GB checkpoint result is the right benchmark because it matches the exact workload DenseVault targets.
- –WebDAV plus range reads is the killer workflow win: existing tools can mount the vault, and even `llamafile` can stream GGUF models straight from it.
- –Entropy-aware compression is a smart guardrail, and the Arch ISO test shows why: already-compressed blobs barely benefit.
- –Delta mode fits model checkpoints that are read whole, but it is a bad fit for live inference files because reconstruction gets in the way of range reads.
- –The single-file SQLite/WORM design is portable and low-friction, but it will need serious durability and concurrency testing if it grows beyond a thesis project.
DISCOVERED
18d ago
2026-03-25
PUBLISHED
18d ago
2026-03-25
RELEVANCE
AUTHOR
FiddleSmol