TurboQuant Model nears lossless 4-bit weights

// 105d agoOPENSOURCE RELEASE

TurboQuant Model nears lossless 4-bit weights

TurboQuant Model adapts the recent TurboQuant algorithm from KV-cache quantization to weight compression, exposing a drop-in `nn.Linear` replacement for PyTorch. Its benchmarks claim 3.2x GPU memory savings vs bf16, and the 4+4 residual mode lands almost exactly on bf16 perplexity on Qwen3.5-0.8B while staying near baseline on Qwen3.5-4B.

// ANALYSIS

This is one of the more credible "quantize everything" experiments in a while: the repo is not just shaving bits, it's showing that a residual pass can recover most of the quality loss. The caveat is that the win depends on a fairly sophisticated kernel path, so the real question is how much of the headline survives outside the authors' benchmark setup.

–On Qwen3.5-0.8B, 4+4 residual gets 14.28 PPL vs 14.29 bf16, which is close enough to feel operationally meaningful.
–Plain 4-bit is still a useful memory play, but it pays a real accuracy tax, so the residual stage is doing most of the heavy lifting.
–The 4B edit is interesting because 4+2 residual slightly beats bf16 on PPL while 4+4 keeps KLD much lower, which is a good reminder that perplexity alone doesn't tell the whole story.
–The implementation story matters: on-the-fly dequantization plus fused CuTile/Triton kernels is what keeps this from becoming an academic demo that falls apart in production.
–There is already some community debate about TurboQuant's theoretical lineage, so I'd treat the "near-optimal" claim as promising but still worth validating in your own stack.

// TAGS

turboquant-modelllmopen-sourceinferencebenchmarkresearch

DISCOVERED

105d ago

2026-03-28

PUBLISHED

105d ago

2026-03-28

RELEVANCE

8/ 10

AUTHOR

cksac

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS16m ago

GPT-5.6 Sol in Claude Code outperforms Codex

Running OpenAI's GPT-5.6 Sol within Anthropic's Claude Code terminal environment reportedly outperforms legacy tools like Codex. The setup highlights the growing shift toward terminal-centric agentic loops for complex software tasks.

MODEL45m ago

Modelers drops Ascend NPU-optimized models

Modelers, the open-source model hub for Huawei's Ascend NPU ecosystem, has released a batch of twelve new fine-tuned model entries focused on hardware-specific efficiency. The release aims to build developer momentum and optimize AI inference for Ascend NPUs, though the impact of individual updates is diluted by the sheer number of simultaneous entries and limited public differentiation.

OPEN SOURCE1h ago

C# PS5 emulator SharpEmu boots 2D games

SharpEmu is an experimental, open-source PlayStation 5 emulator written in C# that targets Windows, Linux, and macOS. In its early development stages, the project has successfully booted simple 2D games like Dreaming Sarah and shown initial progress loading complex titles such as Demon's Souls Remake.