BACK_TO_FEEDAICRIER_2
TurboQuant rotates vectors, trims quantization bias
OPEN_SOURCE ↗
REDDIT · REDDIT// 14d agoRESEARCH PAPER

TurboQuant rotates vectors, trims quantization bias

TurboQuant is Google's vector-quantization scheme for compressing KV caches and search embeddings. It randomly rotates vectors before low-bit quantization, then adds a 1-bit residual step to keep attention inner products unbiased and distortion near-optimal.

// ANALYSIS

TurboQuant is more than a “polar coordinates” trick; the real insight is a two-stage quantizer that makes low-bit compression behave on spiky transformer activations while preserving the math attention cares about.

  • Random rotation spreads signal across coordinates, so scalar quantization stops snapping quasi-sparse vectors onto a single dominant axis.
  • The 1-bit QJL residual is the underrated piece, because MSE-optimal quantizers can distort dot products and attention is built on dot products.
  • The paper points at inference infrastructure, not training: KV-cache compression and vector search are where training-free compression pays off.
  • Reported gains like 3.5 bits per channel with no quality loss, plus up to 8x attention-logit speedup on H100s, make it a serious systems paper rather than a curiosity.
  • The Google blog's polar-coordinate framing is a useful intuition, but it can overstate the geometry and understate the bias-correction step.
// TAGS
turboquantllminferencesearchvector-dbresearch

DISCOVERED

14d ago

2026-03-28

PUBLISHED

14d ago

2026-03-28

RELEVANCE

9/ 10

AUTHOR

-p-e-w-