Qwen 3.6 27B hits 50 TPS with llama.cpp MTP
A developer shares a real-world debugging success story using Qwen 3.6 27B on dual RX 9070 XTs, leveraging llama.cpp's newly merged Multi-Token Prediction (MTP) support to achieve high speeds and autonomous agentic behavior. The setup successfully pinpointed complex networking issues across distributed services while maintaining full privacy in a local environment.
The pairing of Qwen 3.6 with llama.cpp's native MTP support marks a significant leap for high-performance local development environments.
- –MTP support (merged May 16) provides 1.5x–2x speedups by using the model's own prediction heads, avoiding the VRAM and latency overhead of separate draft models.
- –Qwen 3.6 27B demonstrates exceptional intelligence for its size, rivaling the coding capabilities of massive data-center models on benchmarks like SWE-bench.
- –High acceptance rates for MTP draft tokens (often >80%) enable consistent 45+ TPS, making local iteration speeds competitive with low-latency hosted APIs.
- –Native "thinking preservation" and a 262k context window allow for deep, multi-file analysis that survives complex, multi-step debugging sessions.
DISCOVERED
2h ago
2026-05-21
PUBLISHED
3h ago
2026-05-21
RELEVANCE
AUTHOR
ABLPHA