TurboQuant Pro compresses BGE-M3 with PCA
TurboQuant Pro shows that a one-time PCA rotation can make non-Matryoshka embeddings far more truncation-friendly, with BGE-M3 cosine staying near-perfect at 512d and still strong at 128d. The project also benchmarks PCA-plus-quantization against scalar, binary, and PQ compression on a multilingual retrieval corpus.
This is a strong reminder that embedding compression is often a basis-selection problem, not just a bit-budget problem. PCA is not a retrieval silver bullet, but it looks like a very solid linear baseline before moving to heavier codecs.
- –Naive tail truncation is a weak baseline for non-Matryoshka models; PCA makes the retained dimensions carry much more signal.
- –Cosine reconstruction alone is not enough for retrieval decisions, because Recall@10 degrades faster than similarity in the reported runs.
- –PCA plus low-bit quantization looks like a practical middle ground between scalar quantization and more aggressive binary or PQ compression.
- –The next questions are stability and generality: how much the PCA fit varies by corpus, language mix, and downstream ANN index behavior.
DISCOVERED
48d ago
2026-04-09
PUBLISHED
48d ago
2026-04-09
RELEVANCE
AUTHOR
ahbond