Codebook packing cuts LLM RAM 25%, stays lossless

// 120d agoOPENSOURCE RELEASE

Codebook packing cuts LLM RAM 25%, stays lossless

A solo developer built Adaptive Codebook Compression (ACC), a lossless LLM weight compression scheme that exploits the empirical observation that BF16 model weights use far fewer unique values than the theoretical 65,536 the format allows — typically ~7,000–13,000 per layer. By replacing raw weights with packed codebook indices, the tool achieves 10–25% VRAM savings with exact output fidelity, at the cost of roughly 2–3x slower inference.

// ANALYSIS

This is the rare quantization project with a genuinely novel angle: lossless by default, with benchmarks to prove it — cosine similarity >0.999 and exact greedy token match on tested models.

–The core trick is that BF16 model weights are surprisingly non-diverse: layers in Qwen3-1.7B use only ~13 bits worth of unique values, so packing indices with no wasted bits via LCM-group bitpacking yields real savings
–VRAM reduction is modest (~18% lossless on tested models) compared to 4-bit GGUF, but the target audience is different: users who cannot tolerate any quality degradation
–The CPU-offload path is compelling — models that don't fit in VRAM can run entirely from system RAM via a C/OpenMP kernel, at ~0.5 tok/s
–Speed penalty (~2.3x on GPU) is steep and limits production viability today; llama.cpp's quantization-aware kernels are far more optimized
–Still a proof-of-concept with slow offline compression (~60 min for 1.7B on CPU), but the intellectual foundation is sound and the lossless claim is verifiable

// TAGS

adaptive-codebook-compressionllminferenceedge-aiopen-sourcegpu

DISCOVERED

120d ago

2026-03-14

PUBLISHED

120d ago

2026-03-14

RELEVANCE

7/ 10

AUTHOR

bigattichouse

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE20m ago

C# PS5 emulator SharpEmu boots 2D games

SharpEmu is an experimental, open-source PlayStation 5 emulator written in C# that targets Windows, Linux, and macOS. In its early development stages, the project has successfully booted simple 2D games like Dreaming Sarah and shown initial progress loading complex titles such as Demon's Souls Remake.

OPEN SOURCE21m ago

background-agents launches multi-repo coding agents

background-agents is an open-source platform for running autonomous coding agents asynchronously in cloud sandboxes. Built on Cloudflare, Modal, and Daytona, the system enables agents to perform long-running tasks like security audits and migrations across multiple repositories.

OPEN SOURCE22m ago

FlClash is a multi-platform proxy client based on ClashMeta, offering a simple, open-source, and ad-free interface.

FlClash is an open-source, multi-platform GUI proxy client built on ClashMeta. Developed using Dart and Flutter, it offers a unified, ad-free interface for managing network proxy settings across Android, iOS, Windows, macOS, and Linux. The application aims to provide a user-friendly way to configure and run ClashMeta-based rule routing.