YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Codebook packing cuts LLM RAM 25%, stays lossless

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Codebook packing cuts LLM RAM 25%, stays lossless
OPEN LINK ↗
// 74d agoOPENSOURCE RELEASE

Codebook packing cuts LLM RAM 25%, stays lossless

A solo developer built Adaptive Codebook Compression (ACC), a lossless LLM weight compression scheme that exploits the empirical observation that BF16 model weights use far fewer unique values than the theoretical 65,536 the format allows — typically ~7,000–13,000 per layer. By replacing raw weights with packed codebook indices, the tool achieves 10–25% VRAM savings with exact output fidelity, at the cost of roughly 2–3x slower inference.

// ANALYSIS

This is the rare quantization project with a genuinely novel angle: lossless by default, with benchmarks to prove it — cosine similarity >0.999 and exact greedy token match on tested models.

  • The core trick is that BF16 model weights are surprisingly non-diverse: layers in Qwen3-1.7B use only ~13 bits worth of unique values, so packing indices with no wasted bits via LCM-group bitpacking yields real savings
  • VRAM reduction is modest (~18% lossless on tested models) compared to 4-bit GGUF, but the target audience is different: users who cannot tolerate any quality degradation
  • The CPU-offload path is compelling — models that don't fit in VRAM can run entirely from system RAM via a C/OpenMP kernel, at ~0.5 tok/s
  • Speed penalty (~2.3x on GPU) is steep and limits production viability today; llama.cpp's quantization-aware kernels are far more optimized
  • Still a proof-of-concept with slow offline compression (~60 min for 1.7B on CPU), but the intellectual foundation is sound and the lossless claim is verifiable
// TAGS
adaptive-codebook-compressionllminferenceedge-aiopen-sourcegpu

DISCOVERED

74d ago

2026-03-14

PUBLISHED

74d ago

2026-03-14

RELEVANCE

7/ 10

AUTHOR

bigattichouse