OPEN_SOURCE ↗
REDDIT · REDDIT// 26d agoRESEARCH PAPER
Kimi Attention Residuals rethink Transformer skip connections
Moonshot AI’s Kimi team proposes Attention Residuals, replacing fixed residual accumulation with learned depth-wise softmax attention so each layer can selectively pull from earlier representations. In repo-reported results, Block AttnRes acts as a practical drop-in variant, matching baseline loss with about 1.25x compute efficiency gains and improving downstream scores when integrated into Kimi Linear (48B total, 3B activated).
// ANALYSIS
This is a smart architectural bet: upgrade the least-changed part of Transformers without forcing a full stack rewrite.
- –The key value is selectivity across depth, which directly targets PreNorm-style signal dilution in very deep models.
- –Block AttnRes keeps the idea deployable by trading full cross-layer attention for compressed block-level routing.
- –Reported latency overhead (<2% inference) is low enough to make real-world adoption plausible if independent replications hold.
- –The biggest open question is external validation across non-Kimi model families and larger-scale training regimes.
// TAGS
attention-residualskimi-linearllmtransformerresearchinferenceopen-source
DISCOVERED
26d ago
2026-03-17
PUBLISHED
26d ago
2026-03-16
RELEVANCE
9/ 10
AUTHOR
nekofneko