BACK_TO_FEEDAICRIER_2
TurboQuant TQ3_4S hits 2x speedup
OPEN_SOURCE ↗
REDDIT · REDDIT// 10d agoMODEL RELEASE

TurboQuant TQ3_4S hits 2x speedup

TurboQuant’s TQ3_4S is a new 3.5-bit quantization format for LLM inference that claims up to 2x speed improvement while maintaining high quality via a Walsh-Hadamard transform approach.

// ANALYSIS

TurboQuant’s new weight format is a significant step for local LLM inference, offering a better speed-to-quality ratio than standard GGUF quants like Q3_K_S.

  • Achieves 2x faster inference speeds on consumer GPUs compared to previous quantization methods.
  • Utilizes a 3.5-bit Walsh-Hadamard transform with four per-8 scales per 32-weight block for improved granularity.
  • Outperforms standard Q3_K_S quants in perplexity (6.8224 vs 6.8630) on Qwen 3.5-27B.
  • Currently requires a specific llama.cpp fork, highlighting the ongoing fragmentation in local inference optimization.
  • The 12.9 GiB size for a 27B model makes high-parameter models significantly more accessible on mid-range hardware.
// TAGS
llminferenceopen-weightsturboquant

DISCOVERED

10d ago

2026-04-02

PUBLISHED

10d ago

2026-04-01

RELEVANCE

8/ 10

AUTHOR

Imaginary-Anywhere23