llama.cpp adds TurboQuant lite KV cache

// 70d agoPRODUCT UPDATE

llama.cpp adds TurboQuant lite KV cache

llama.cpp integrates "attn-rot," a simplified TurboQuant implementation that enables high-quality 4-bit KV cache quantization. By using Hadamard transforms to redistribute outliers, the update allows for massive context windows with minimal reasoning loss on consumer hardware.

// ANALYSIS

Hadamard rotation is the "holy grail" for local LLM efficiency, solving the logic breakdown usually seen with aggressive KV cache quantization. This merge effectively doubles context capacity for most consumer GPUs without sacrificing intelligence.

–Hadamard transforms redistribute the "energy" of outlier vectors, making them easier to quantize accurately.
–4-bit KV caches previously caused models to "break down" in logic; this update brings them near full-precision performance.
–While adding a minor 2-12% performance hit, the ability to fit 2-4x more context into the same memory is a massive trade-off for most developers.
–Implementation is backend-agnostic, providing immediate gains for CUDA, Metal, and CPU inference.
–Focuses on the "rotation" aspect of the TurboQuant paper to maintain speed while gaining precision.

// TAGS

llama-cppllminferenceopen-sourcebenchmark

DISCOVERED

70d ago

2026-04-01

PUBLISHED

70d ago

2026-03-31

RELEVANCE

9/ 10

AUTHOR

Dany0

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS37m ago

Claude Fable 5 tops 5.5 in data analysis

In a recent post on X, user Theo expressed intense enthusiasm about the data analysis capabilities of an AI model called Fable. By stating it is "WAY better than 5.5," the user implies a significant generational leap in performance over what is likely a major foundational model, suggesting Fable is exceptionally well-suited for complex data tasks.

MODEL1h ago

Claude Fable 5 launch sparks massive developer backlash

Anthropic's Claude Fable 5 launch faces severe developer backlash over aggressive safety restrictions, high pricing, and a forced 30-day data retention policy. The model silently routes chemistry, biology, and cybersecurity requests to the older Opus 4.8 model, frustrating users with opaque downgrades and anti-distillation blocks.

MODEL1h ago

Designers praise Claude Fable 5 landing pages

Educator and designer Meng To highlighted Claude Fable 5's capability for creating landing pages on X, calling the model "a monster" for the task. Released in June 2026, Claude Fable 5 is Anthropic's latest Mythos-class AI model, featuring a 1-million-token context window, a 128,000-token output capacity, and advanced reasoning for long-horizon agentic workflows, making it highly effective for complex design and front-end code generation tasks.