Sessa targets long-context memory retention
Sessa is a new long-context decoder architecture from sole author Liubomyr Horbatko that moves self-attention into a recurrent feedback pathway, treating attention as part of the model’s memory dynamics instead of a one-shot read over prior tokens. The paper argues this design can retain long-range influence better than matched Transformer and Mamba-style baselines, and an Apache-2.0 PyTorch implementation is already available on GitHub.
This stands out because the paper is not just claiming better long-context performance empirically, but arguing for a different memory mechanism with explicit theoretical advantages. The core differentiator is that attention is embedded inside recurrence, so past information can flow through many attention-mediated paths instead of a single attention read or a single recurrent chain; the main caveat is that the results are framed under explicit assumptions and matched regimes, so the real test will be whether the theory holds up under scaling, training stability, and broader benchmark coverage.
DISCOVERED
5h ago
2026-04-23
PUBLISHED
6h ago
2026-04-23
RELEVANCE
AUTHOR
WittyAtmosphere8171