Palimpsa boosts transformer memory via Bayesian metaplasticity
A new research paper introduces Palimpsa, an attention model that treats in-context learning as a continual learning problem. By using Bayesian metaplasticity to dynamically adjust how states remember and forget, it significantly expands effective memory capacity for processing long sequences.
Palimpsa brings a biological "synaptic memory" approach to transformer architecture, offering a mathematically grounded way to beat the stability-plasticity dilemma in long contexts.
- –By dynamically tying plasticity to an "importance state," the model avoids overwriting crucial early context when processing massive sequences
- –The paper theoretically proves that Mamba2 is just a special case of Palimpsa where forgetting dynamics dominate
- –It introduces a framework to convert standard non-metaplastic linear attention models into metaplastic ones without a massive parameter penalty
- –The implementation uses a fused Triton kernel, meaning these memory gains don't sacrifice the throughput expected from standard linear attention
DISCOVERED
49d ago
2026-04-08
PUBLISHED
49d ago
2026-04-08
RELEVANCE
AUTHOR
kawasaki001