TurboQuant nabs 34 tok/s for 30B model on Mac

// 45d agoOPENSOURCE RELEASE

TurboQuant nabs 34 tok/s for 30B model on Mac

Google Research's TurboQuant algorithm enables 3-bit weight compression and fast inference on Apple Silicon via custom Metal kernels. It delivers a 42x speedup over fallbacks while maintaining significantly higher accuracy than standard 3-bit quantization.

// ANALYSIS

TurboQuant represents a fundamental unlock for running large models on consumer hardware by solving the memory bottleneck in long-context sessions. Achieving 34 tok/s on a 30B model with a 48GB Mac puts flagship-level coding capabilities within reach of local developers. The scalar HIGGS algorithm's 3-bit compression eliminates the need for tedious calibration datasets, while performance gains over MLX's native quantization prove that theoretical rigor in kernel design pays massive dividends. While it excels in single-user decode, the current implementation's "dequant-per-forward tax" on prefill remains a target for future optimization.

// TAGS

turboquantvllmquantizationedge-aiapple-siliconllminference

DISCOVERED

45d ago

2026-04-19

PUBLISHED

45d ago

2026-04-18

RELEVANCE

8/ 10

AUTHOR

Varjoranta

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS30m ago

Meng To builds multiple apps with Codex

Designer and developer Meng To praised Codex as the best app he has used this year, sharing that he used it to build three web apps, a Mac app, and an iOS app. Due to his success, he is considering creating a course on building products with Codex and is polling his audience for interest.

LAUNCH47m ago

Modulr drops AI agent generator

Modulr has released its AI Agent Generator, a tool that generates complete, ready-to-ship TypeScript codebases from natural language prompts. The generator joins Modulr's suite of developer resources, including wallet risk reports and smart contract tools.

UPDATE2h ago

OpenAI introduces "Sites" on Codex, a feature allowing users to instantly generate and host live web apps and dashboards from natural language prompts.

OpenAI has launched a preview of "Sites" for its Codex AI agent platform, enabling users to build, deploy, and host interactive web applications and dashboards instantly from a text prompt. Currently available for ChatGPT Business and Enterprise workspaces, the tool bypasses traditional website builders by hosting the applications on live URLs that can be shared with teams. Along with Sites, OpenAI introduced six role-specific plugins integrating 62 apps and 110 skills (such as Salesforce, HubSpot, Snowflake, and Figma) and added annotation capabilities across documents, spreadsheets, and slides.