DiscoLoop prevents representation drift in looping Transformers
DiscoLoop is a looping Transformer architecture designed to address representation drift across iteration loops by maintaining both a discrete embedding channel and a continuous hidden-state channel. This dual-channel design prevents representation drift across loops, leading to significant improvements in out-of-distribution generalization and multi-hop reasoning capabilities.
Looping Transformers offer a path to parameter-efficient reasoning, but representation drift has historically limited their depth. DiscoLoop's hybrid discrete-continuous approach elegantly solves this by anchoring intermediate computations with discrete embeddings.
* The dual-channel design mitigates representation drift, allowing the model to perform deeper loops without degradation.
* Significant improvements in out-of-distribution generalization show that the architecture learns genuine algorithmic reasoning.
* Combining discrete and continuous pathways could unlock more robust adaptive-depth transformers for complex task planning.
DISCOVERED
1h ago
2026-07-03
PUBLISHED
1h ago
2026-07-03
RELEVANCE
AUTHOR
Discover AI