OPEN_SOURCE ↗
REDDIT · REDDIT// 37d agoRESEARCH PAPER
DWARF claims O(1) KV-cache attention
DWARF is a new attention architecture that replaces most full-history attention lookups with a fixed set of 44 dyadic offsets, keeping DSQG-layer KV cache roughly constant at about 1.5GB even as context length grows. The public GitHub repo ships code, ablations, benchmarks, and a paper arguing this sparse, physics-derived layout preserves much of the useful long-range signal without paying standard long-context memory costs.
// ANALYSIS
This is a genuinely interesting efficiency idea, but it reads more like an ambitious research drop than a settled breakthrough.
- –The big claim is memory, not raw benchmark domination: fixed-cache inference matters most when context windows get large enough to make standard KV storage painful.
- –DWARF is a hybrid design, not full replacement attention, because it still keeps one standard causal attention layer for global context binding.
- –The repo is unusually thorough for an early project, with training scripts, Triton kernels, ablation tables, and Rust-based verification code already published.
- –The work is still self-published and pre-peer-review, so the next milestone is independent replication rather than bigger README numbers.
// TAGS
dwarfllminferenceresearchopen-sourcebenchmark
DISCOVERED
37d ago
2026-03-06
PUBLISHED
37d ago
2026-03-05
RELEVANCE
8/ 10
AUTHOR
MariusNocturnum