TurboQuant TQ3_4S hits 2x speedup

// 102d agoMODEL RELEASE

TurboQuant TQ3_4S hits 2x speedup

TurboQuant’s TQ3_4S is a new 3.5-bit quantization format for LLM inference that claims up to 2x speed improvement while maintaining high quality via a Walsh-Hadamard transform approach.

// ANALYSIS

TurboQuant’s new weight format is a significant step for local LLM inference, offering a better speed-to-quality ratio than standard GGUF quants like Q3_K_S.

–Achieves 2x faster inference speeds on consumer GPUs compared to previous quantization methods.
–Utilizes a 3.5-bit Walsh-Hadamard transform with four per-8 scales per 32-weight block for improved granularity.
–Outperforms standard Q3_K_S quants in perplexity (6.8224 vs 6.8630) on Qwen 3.5-27B.
–Currently requires a specific llama.cpp fork, highlighting the ongoing fragmentation in local inference optimization.
–The 12.9 GiB size for a 27B model makes high-parameter models significantly more accessible on mid-range hardware.

// TAGS

llminferenceopen-weightsturboquant

DISCOVERED

102d ago

2026-04-02

PUBLISHED

102d ago

2026-04-01

RELEVANCE

8/ 10

AUTHOR

Imaginary-Anywhere23

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

INFRA58m ago

GLM-5 runs natively on Ascend via FlagOS

Zhipu AI's GLM-5 has been packaged for native execution on Huawei Ascend NPUs using the FlagOS framework, representing the first CUDA-free deployment of a Chinese general-purpose LLM on domestic hardware. This integration satisfies local sovereignty requirements across hardware, model, and inference runtime in a single package.

INFRA1h ago

Alchemy enables declarative agentic infrastructure

Sam Goodwin shared a declarative workflow for constructing agentic infrastructure using Alchemy, combining English prompts and TypeScript code in a single TypeScript file. By utilizing string template literals and a simple alchemy deploy command, developers can deploy applications directly to the cloud without manual environment setup.

BENCHMARK2h ago

Gemini 3.5 Pro Tops Rivals in Leak

A leaked benchmark report claims that Google's rumored Gemini 3.5 Pro model achieves superior performance compared to rival models Claude Fable 5 and GPT-5.6 in internal evaluations. The leak suggests significant advancements in Google's next-generation frontier AI model, though official validation is still pending.