Qwen3.6-27B MTP boosts Apple silicon speed

// 45d agoBENCHMARK RESULT

Qwen3.6-27B MTP boosts Apple silicon speed

A MacBook M5 Max user benchmarked Qwen3.6-27B in llama.cpp and OpenWebUI and found the MTP build only gave a modest gain at first. After tuning speculative decoding with spec-draft-n-max 3 and spec-draft-p-min 0.75, throughput rose to 24.5 tps, and a coding prompt pushed the MTP variant to 27.70 tps versus 17.44 tps for the non-MTP model.

// ANALYSIS

Hot take: this is not a universal 2x speedup story, it is a tuning story, and on an M5 Max the gains look solid only when the draft model is actually being accepted often enough. The initial config was likely too conservative for speculative decoding, so the first MTP result under-represented the model’s upside. Raising spec-draft-n-max and setting spec-draft-p-min improved throughput materially, which points to draft quality and acceptance being the bottleneck. The coding prompt produced about 95% acceptance, which is why the MTP variant pulled ahead much more clearly there. The 27B numbers are the most useful data point here: 17.44 tps non-MTP versus 27.70 tps MTP is a meaningful improvement for local inference on Apple silicon. The takeaway for other users is to benchmark by workload, not just by model name, because prose and coding prompts can behave very differently with MTP.

// TAGS

qwenmtpllama.cppopenwebuiquantizationspeculative-decodingapple-siliconmacbooklocal-firstbenchmark

DISCOVERED

45d ago

2026-05-24

PUBLISHED

45d ago

2026-05-24

RELEVANCE

8/ 10

AUTHOR

chimph

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

FUNDING3m ago

Vercel acquires Better Auth for AI agents

Vercel has acquired the open-source TypeScript authentication library Better Auth, which will remain free and MIT-licensed. The acquisition aims to accelerate the development of scoped, revocable identity infrastructure ('Agent Auth') for autonomous AI agents.

TUTORIAL1h ago

Developer maps Claude Fable 5 agentic workflows

A developer has published a visual breakdown of Anthropic's Claude Fable 5 agentic architecture, mapping its complex workflows into nine editable Excalidraw diagrams. The resource illustrates core agent concepts like trust ledgers, daily loops, and standing goals to help developers design autonomous AI systems.

NEWS3h ago

Silver Touch nabs RITES Parakh AI contract

Silver Touch Technologies Ltd has secured a ₹6.28 Cr order from RITES Limited to build "Parakh," India's first self-hosted, multi-model AI platform for appraising complex infrastructure project reports. Operating entirely on-premises with zero external data dependencies, the system integrates Llama 3.1, Mistral, and Qwen models with over 500 codified engineering rules and a hallucination prevention framework.