TurboQuant-H squishes Gemma 4 embeddings to 2-bit

// 90d agoRESEARCH PAPER

TurboQuant-H squishes Gemma 4 embeddings to 2-bit

Cactus Compute has introduced TurboQuant-H, a 2-bit quantization technique for embedding layers specifically optimized for Gemma 4's "AltUp" architecture. By utilizing Hadamard rotations and Lloyd-Max codebooks, it shrinks a 5B parameter model from 4.8GB to 2.9GB with negligible perplexity loss, enabling sophisticated on-device AI on 4GB RAM mobile hardware.

// ANALYSIS

2-bit embeddings are a major win for on-device LLMs where bloated embedding tables often bottleneck deployment on consumer hardware.

–Deterministic Hadamard rotations simplify the quantization pipeline compared to random orthogonal methods.
–Achieves a 40% reduction in total model weight for Gemma 4 E2B with only a 0.06 increase in perplexity.
–Enables large, reasoning-capable models to fit within the memory constraints of standard mobile and wearable devices.
–No measured speed regression on inference, as butterfly factorization offsets the reduced memory bandwidth requirements.

// TAGS

llmembeddingedge-aiopen-sourceturboquant-h

DISCOVERED

90d ago

2026-04-22

PUBLISHED

90d ago

2026-04-22

RELEVANCE

8/ 10

AUTHOR

Henrie_the_dreamer

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE1h ago

MCP TypeScript SDK simplifies LLM integration

The Model Context Protocol (MCP) TypeScript SDK is the official TypeScript implementation of MCP, designed to help developers build servers and clients without having to implement the protocol layer from scratch. The SDK simplifies the process of exposing and connecting context sources to LLMs, facilitating seamless integration.

BENCHMARK1h ago

Kimi K3 takes fourth in Agent Arena

Moonshot AI's Kimi K3 model has achieved fourth place on the Agent Arena leaderboard, demonstrating a +9.6% net efficiency gain. The 2.8-trillion-parameter Mixture-of-Experts model features a hybrid linear attention mechanism supporting a 1-million-token context window and native visual understanding.

OPEN SOURCE1h ago

Loopkit launches in-repo AI coding agent framework

loopkit is a modular developer toolset and execution framework designed to run directly within a codebase repository. It structures agent actions through a plan-act-verify loop, loading specific skills dynamically based on triggers and utilizing a dedicated verifier to validate completed tasks, enabling tools like Cursor and Claude Code to perform automated development workflows without requiring a heavy external runtime.