Claude Code Local tests TurboQuant on M5 Max
A Reddit thread points to Claude Code Local, an Apple Silicon setup that runs Claude Code locally against a Qwen 3.5 122B model using TurboQuant. The repo says an M5 Max 128GB build reaches 41 tok/s through llama.cpp + TurboQuant and 65 tok/s after switching to a native MLX server.
Interesting proof of concept, but the speedup looks more like a native-stack win than a TurboQuant miracle.
- –The repo's own numbers show the bottleneck clearly: 41 tok/s with llama.cpp + TurboQuant versus 65 tok/s on the MLX-native path.
- –TurboQuant is about KV cache compression, so its payoff shows up most in long-context sessions and agent loops, not in shrinking model weights.
- –The M5 Max 128GB test is encouraging, but it is still premium-hardware territory rather than a generic desktop recipe.
- –Apple Silicon's unified memory and MLX/Metal stack make this a more plausible fit on Macs than on Windows, where the surrounding tooling is less native.
- –For local coding agents, the real win here is privacy and cost control: you can keep Claude Code-style workflows on-device without cloud APIs.
DISCOVERED
60d ago
2026-03-28
PUBLISHED
60d ago
2026-03-28
RELEVANCE
AUTHOR
Mami_KLK_Tu_Quiere