YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

TurboQuant TQ3_4S hits 2x speedup

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

TurboQuant TQ3_4S hits 2x speedup
OPEN LINK ↗
// 55d agoMODEL RELEASE

TurboQuant TQ3_4S hits 2x speedup

TurboQuant’s TQ3_4S is a new 3.5-bit quantization format for LLM inference that claims up to 2x speed improvement while maintaining high quality via a Walsh-Hadamard transform approach.

// ANALYSIS

TurboQuant’s new weight format is a significant step for local LLM inference, offering a better speed-to-quality ratio than standard GGUF quants like Q3_K_S.

  • Achieves 2x faster inference speeds on consumer GPUs compared to previous quantization methods.
  • Utilizes a 3.5-bit Walsh-Hadamard transform with four per-8 scales per 32-weight block for improved granularity.
  • Outperforms standard Q3_K_S quants in perplexity (6.8224 vs 6.8630) on Qwen 3.5-27B.
  • Currently requires a specific llama.cpp fork, highlighting the ongoing fragmentation in local inference optimization.
  • The 12.9 GiB size for a 27B model makes high-parameter models significantly more accessible on mid-range hardware.
// TAGS
llminferenceopen-weightsturboquant

DISCOVERED

55d ago

2026-04-02

PUBLISHED

55d ago

2026-04-01

RELEVANCE

8/ 10

AUTHOR

Imaginary-Anywhere23