OPEN_SOURCE ↗
REDDIT · REDDIT// 11d agoOPENSOURCE RELEASE
RBF Attention Trades Dot Products for Distance
A developer rebuilt self-attention around an RBF-style distance metric instead of raw dot products, then published the engineering writeup and code. The project shows how far you have to rewrite the stack once you stop treating dot-product attention as sacred.
// ANALYSIS
This is a useful proof-of-concept, not a near-term FlashAttention replacement. The interesting part is less the math than the systems lesson: attention kernels, positional encoding, and sink behavior are all coupled to dot-product assumptions.
- –The RBF formulation is effectively dot-product attention plus a key-norm penalty, which makes the geometry more explicit without changing the basic softmax machinery
- –A custom kernel was necessary because fused SDPA paths are optimized around the standard attention contract, not arbitrary score shaping
- –The register-token workaround for attention sinks is a practical reminder that some “bugs” in transformers are actually learned stability mechanisms
- –Dropping RoPE for additive sinusoidal embeddings is a coherent choice once distance, not angle, is the scoring primitive
- –The TinyStories result is interesting but not decisive; this reads more like a researchy systems experiment than a validated production alternative
// TAGS
llmresearchopen-sourceinferencerbf-attention
DISCOVERED
11d ago
2026-04-01
PUBLISHED
11d ago
2026-04-01
RELEVANCE
8/ 10
AUTHOR
4rtemi5