exo native MTP boosts Qwen3.6

// 63d agoBENCHMARK RESULT

exo native MTP boosts Qwen3.6

The first exo contribution adds native multi-token prediction support for Qwen3.6-style MLX checkpoints, enabled by default on macOS unless EXO_NATIVE_MTP_ENABLED=0 is set. The author reports exactness parity against target-greedy decoding plus benchmark wins on 27B and 35B-A3B settings, along with model-card plumbing and generation-stat reporting.

// ANALYSIS

Hot take: this looks like a real systems win, but only where draft overhead and verifier cost stay under control.

–27B is the clean success case: K=2 and K=3 both land near 2x throughput versus MTP off, with K=2 slightly ahead in the broad sweep.
–35B-A3B is more fragile: K=1 is the best setting, and higher K gives back the gain as verifier/cache costs dominate.
–Exactness is the important part here: the recorded greedy probes matched target-greedy for the tested settings, so this is not just a speed hack.
–The practical scope is still narrow: single-node only, explicit model-card metadata required, and stateful logits processors are not yet routed through native MTP.

// TAGS

exoqwen3.6mlxmulti-token-predictionspeculative-decodingapple-siliconbenchmarkinference

DISCOVERED

63d ago

2026-05-23

PUBLISHED

63d ago

2026-05-23

RELEVANCE

8/ 10

AUTHOR

meaningego

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

VIDEO1h ago

Lower reasoning effort boosts Claude Opus 5 performance

In a video evaluation by Every, testing shows that Anthropic's Claude Opus 5 performs significantly better when configured with medium or low reasoning effort rather than maximum thinking settings. While max reasoning is designed for heavy problem-solving, it frequently causes the model to overthink, over-complicate solutions, and introduce unnecessary errors.

VIDEO1h ago

Claude Opus 5 Lags Rivals in Developer Workflows

In a hands-on review by Every, Anthropic's high-capability Claude Opus 5 model is put to the test across real-world daily coding and autonomous developer workflows. Despite its advanced reasoning metrics and position as a frontier model, the analysis highlights practical friction points—including latency and cost-benefit trade-offs—that prevent it from displacing current daily drivers like GPT-5.6 and Claude Fable in active developer setups.

UPDATE3h ago

Softr adds visual co-building and vibe coding

Softr has introduced visual co-building alongside customizable vibe-coded blocks, pairing prompt-based AI generation with direct visual editing. The platform allows users to rapidly generate, adjust, and deploy custom business portals, CRMs, and internal tools, bridging the gap between natural language prompt creation and precise interface design.