d² Pullback Theorem sparks attention debate
An anonymous Korean forum paper titled The d² Pullback Theorem argues that attention’s real optimization geometry is fundamentally d² rather than n², then uses that claim to motivate a quadratic replacement called CSQ Attention with O(nd³) training and inference. The Reddit thread took off because researchers found the dimensionality argument interesting but pushed back hard on the leap from shared optimization space to softmax-equivalent behavior.
This is exactly the kind of outsider theorem drop that goes viral in ML circles: it attacks transformer efficiency at the mathematical level, but right now it reads more like a provocative reframing than a settled breakthrough.
- –The paper’s core claim is that attention parameters only explore a d²-dimensional landscape, and that softmax creates useful matching while also inflating the apparent n² bottleneck
- –Reddit reaction is mixed: several commenters said the d² observation looks plausible or adjacent to existing low-rank attention intuition, but rejected the stronger claim that it validates degree-2 polynomial attention as an equivalent replacement
- –The sharpest criticism is that the paper appears to conflate optimization dimensionality with actual computational cost and functional expressivity, which are not the same thing
- –One commenter linked prior work, “Rethinking Attention: Polynomial Alternatives to Softmax in Transformers,” underscoring that polynomial substitutes are already an active research direction rather than a wholly new frontier
- –There is still no peer review, institutional backing, or serious large-scale benchmark evidence attached, so the news is the debate itself, not a verified path beyond Transformers
DISCOVERED
37d ago
2026-03-06
PUBLISHED
38d ago
2026-03-05
RELEVANCE
AUTHOR
Ok-Preparation-3042