RotorQuant hits 10x faster KV cache compression
RotorQuant is a high-performance compression tool that replaces dense rotation matrices with sparse Clifford rotors to decorrelate KV caches. It achieves up to 31x speedups and 44x fewer parameters compared to Google's TurboQuant while maintaining higher attention fidelity.
RotorQuant is a breakthrough in LLM efficiency that shifts the decorrelation bottleneck from O(d²) to O(d) using the algebraic sparsity of geometric algebra.
- –Clifford rotors in Cl(3,0) allow for fused kernels that rotate vectors with 160x fewer operations than traditional dense matrix-vector multiplications.
- –By utilizing block-diagonal rotations instead of global "scrambling," the tool better preserves the directional integrity of attention heads, leading to improved perplexity scores.
- –The 44x reduction in rotation parameters (from 16k to 372 for d=128) significantly lowers the memory overhead for deploying long-context models on consumer hardware.
- –Native support for both CUDA and Apple Silicon Metal makes it a versatile drop-in for local LLM ecosystems like llama.cpp.
DISCOVERED
45d ago
2026-04-12
PUBLISHED
45d ago
2026-04-12
RELEVANCE
AUTHOR
AI Search