Cartridges and STILL simplify KV-cache benchmarking
A public, single-GPU code release reproduces two recent long-context inference ideas: Cartridges for corpus-specific compressed KV caches and STILL for reusable neural KV-cache compaction. The repos emphasize runnable benchmarks, readable implementations, and direct comparisons against full-context inference, truncation, and Cartridges.
Strong open-source systems contribution: it turns KV-cache compression into something you can benchmark on one GPU, with standardized data layouts, inspectable code, and aligned comparisons that make the tradeoffs much easier to study than paper-only summaries.
DISCOVERED
45d ago
2026-04-21
PUBLISHED
45d ago
2026-04-20
RELEVANCE
AUTHOR
shreyansh26