BACK_TO_FEEDAICRIER_2
DWARF claims O(1) KV-cache attention
OPEN_SOURCE ↗
REDDIT · REDDIT// 37d agoRESEARCH PAPER

DWARF claims O(1) KV-cache attention

DWARF is a new attention architecture that replaces most full-history attention lookups with a fixed set of 44 dyadic offsets, keeping DSQG-layer KV cache roughly constant at about 1.5GB even as context length grows. The public GitHub repo ships code, ablations, benchmarks, and a paper arguing this sparse, physics-derived layout preserves much of the useful long-range signal without paying standard long-context memory costs.

// ANALYSIS

This is a genuinely interesting efficiency idea, but it reads more like an ambitious research drop than a settled breakthrough.

  • The big claim is memory, not raw benchmark domination: fixed-cache inference matters most when context windows get large enough to make standard KV storage painful.
  • DWARF is a hybrid design, not full replacement attention, because it still keeps one standard causal attention layer for global context binding.
  • The repo is unusually thorough for an early project, with training scripts, Triton kernels, ablation tables, and Rust-based verification code already published.
  • The work is still self-published and pre-peer-review, so the next milestone is independent replication rather than bigger README numbers.
// TAGS
dwarfllminferenceresearchopen-sourcebenchmark

DISCOVERED

37d ago

2026-03-06

PUBLISHED

37d ago

2026-03-05

RELEVANCE

8/ 10

AUTHOR

MariusNocturnum