Multiscreen architecture replaces softmax with absolute relevance screening
Multiscreen introduces "screening," a mechanism that evaluates query-key relevance against an absolute threshold instead of relative mass redistribution. This enables 3.2x faster inference and 40% parameter reduction while maintaining strong performance in long-context retrieval and perplexity.
Softmax's relative weight distribution has been the fundamental bottleneck for sparsity and long-context scaling; Multiscreen's thresholding finally solves the "always attending" problem.
- –40% fewer parameters with comparable validation loss to traditional Transformers.
- –3.2x inference speedup at 100k context length via sparse computation skipping.
- –Enables stable training at significantly higher learning rates (up to 0.0625).
- –Spartan attention: models learn to attend only to relevant keys, often screening out 95% of the context.
DISCOVERED
54d ago
2026-04-03
PUBLISHED
54d ago
2026-04-03
RELEVANCE
AUTHOR
Thrumpwart