OPEN_SOURCE ↗
REDDIT · REDDIT// 3d agoRESEARCH PAPER
Palimpsa boosts transformer memory via Bayesian metaplasticity
A new research paper introduces Palimpsa, an attention model that treats in-context learning as a continual learning problem. By using Bayesian metaplasticity to dynamically adjust how states remember and forget, it significantly expands effective memory capacity for processing long sequences.
// ANALYSIS
Palimpsa brings a biological "synaptic memory" approach to transformer architecture, offering a mathematically grounded way to beat the stability-plasticity dilemma in long contexts.
- –By dynamically tying plasticity to an "importance state," the model avoids overwriting crucial early context when processing massive sequences
- –The paper theoretically proves that Mamba2 is just a special case of Palimpsa where forgetting dynamics dominate
- –It introduces a framework to convert standard non-metaplastic linear attention models into metaplastic ones without a massive parameter penalty
- –The implementation uses a fused Triton kernel, meaning these memory gains don't sacrifice the throughput expected from standard linear attention
// TAGS
palimpsallmresearchinference
DISCOVERED
3d ago
2026-04-08
PUBLISHED
3d ago
2026-04-08
RELEVANCE
8/ 10
AUTHOR
kawasaki001