TurboQuant Python implementation skips calibration

// 106d agoOPENSOURCE RELEASE

TurboQuant Python implementation skips calibration

A clean Python repo implements TurboQuant, a near-optimal 1-4 bit vector quantizer for streaming KV caches and vector search. It combines random rotation, scalar quantization, and a 1-bit residual fix so it works without offline calibration.

// ANALYSIS

The interesting part here is not just that TurboQuant got ported to Python, but that it turns a mathematically neat paper into something developers can actually inspect and benchmark. With Google Research now publishing an official explainer, the method looks less like a niche trick and more like an emerging compression primitive.

–Streaming KV-cache compression is the cleanest fit because the method removes calibration from the workflow entirely.
–The 1-bit residual correction matters more than it sounds; dot-product bias is what breaks retrieval and attention at low bits.
–The repo is a strong reference baseline, not a drop-in production primitive, because the dense rotation path is still the bottleneck.
–Fractional-bit channel splitting is still missing, which leaves the most deployment-friendly part of the paper for later work.

// TAGS

turboquantllminferencevector-dbopen-sourceresearch

DISCOVERED

106d ago

2026-03-30

PUBLISHED

107d ago

2026-03-29

RELEVANCE

8/ 10

AUTHOR

chhed_wala_kaccha

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

RESEARCH45m ago

Direct-OPD bypasses expensive reinforcement learning by distilling policy shifts from small teacher models to stronger students.

Direct-OPD (Direct On-Policy Distillation) is a novel method for weak-to-strong generalization that addresses the high computational cost of Reinforcement Learning with Verifiable Rewards (RLVR) in large language models. Rather than running resource-intensive RL from scratch on larger models or performing naive policy distillation (which carries over teacher biases), Direct-OPD extracts the RL-induced policy shift of a smaller teacher model by taking the log-ratio of its post-RL and pre-RL checkpoints. This log-ratio serves as a dense, implicit on-policy reward signal to guide the training of the stronger student model. The method significantly reduces rollout costs while demonstrating competitive reasoning capabilities and support for sequential policy shift compositions.

UPDATE50m ago

Claude Code adds Bonsai 27B support

PrismML's Bonsai 27B, a highly quantized 27B-parameter model, can now be integrated with Anthropic's Claude Code via the hf-claude CLI extension. The extension routes Claude Code commands through Hugging Face Inference Providers, enabling efficient local agentic workflows on consumer hardware.

UPDATE1h ago

Vercel open-sources AI Gateway leaderboard data

Vercel has open-sourced its AI Gateway leaderboard data under the CC BY 4.0 license, allowing users to download metrics as CSV files or query them programmatically via a new API endpoint. Additionally, developers can now generate and export customizable PNG charts of real-world model usage directly from the platform.