OPEN_SOURCE ↗
REDDIT · REDDIT// 10d agoMODEL RELEASE
TurboQuant TQ3_4S hits 2x speedup
TurboQuant’s TQ3_4S is a new 3.5-bit quantization format for LLM inference that claims up to 2x speed improvement while maintaining high quality via a Walsh-Hadamard transform approach.
// ANALYSIS
TurboQuant’s new weight format is a significant step for local LLM inference, offering a better speed-to-quality ratio than standard GGUF quants like Q3_K_S.
- –Achieves 2x faster inference speeds on consumer GPUs compared to previous quantization methods.
- –Utilizes a 3.5-bit Walsh-Hadamard transform with four per-8 scales per 32-weight block for improved granularity.
- –Outperforms standard Q3_K_S quants in perplexity (6.8224 vs 6.8630) on Qwen 3.5-27B.
- –Currently requires a specific llama.cpp fork, highlighting the ongoing fragmentation in local inference optimization.
- –The 12.9 GiB size for a 27B model makes high-parameter models significantly more accessible on mid-range hardware.
// TAGS
llminferenceopen-weightsturboquant
DISCOVERED
10d ago
2026-04-02
PUBLISHED
10d ago
2026-04-01
RELEVANCE
8/ 10
AUTHOR
Imaginary-Anywhere23