Moonshot AI debuts Attention Residuals
Moonshot AI's Kimi team introduces Attention Residuals (AttnRes), a novel architecture that replaces fixed residual connections with a depth-wise attention mechanism. By allowing layers to selectively retrieve information from all preceding representations, AttnRes achieves a 1.25x compute advantage and effectively mitigates signal dilution in deep Transformer models.
AttnRes reframes model depth as a sequence dimension, replacing fixed residual connections with a learned weighted combination of all preceding layers. The architecture achieves a 1.25x compute advantage and significant reasoning gains on the Kimi 48B model, introducing "Block AttnRes" to maintain linear overhead with negligible latency impact.
DISCOVERED
26d ago
2026-03-17
PUBLISHED
26d ago
2026-03-17
RELEVANCE
AUTHOR
AI Revolution