Looped transformers generalize composition through iterative depth
This paper studies recurrent-depth transformers, a looped architecture that reuses the same layers multiple times in a single forward pass. The authors argue that standard transformers can retain factual knowledge yet still fail at implicit multi-hop reasoning because they struggle to compose that knowledge systematically. In controlled experiments, recurrent-depth models handled both systematic generalization and depth extrapolation better than vanilla transformers, with deeper reasoning emerging as inference-time recurrence increased. The paper also identifies a failure mode, overthinking, where too much recurrence hurts predictions.
Hot take: the main win here is not “more thinking” in the abstract, but a cleaner way to turn fixed parameters into iterative computation when the task needs composition.
- –The interesting result is compositional generalization: looped depth helps the model combine facts it did not see composed during training.
- –The depth extrapolation finding matters operationally: extra test-time loops can buy more reasoning depth without retraining the whole model.
- –The mechanistic story is stronger than a benchmark-only claim because the paper tracks a grokking-like transition from memorization to systematic generalization.
- –The caveat is real: recurrence is not free, and too many loops can degrade outputs through overthinking.
DISCOVERED
2h ago
2026-05-01
PUBLISHED
3h ago
2026-05-01
RELEVANCE
AUTHOR
AlphaSignalAI