Gemma 4 MTP hinges on acceptance

// 4h agoBENCHMARK RESULT

Gemma 4 MTP hinges on acceptance

A LocalLLaMA user benchmarked Gemma 4’s MTP drafters on MLX-VLM and found the speedup is workload-sensitive. Code generation sped up materially, but prose barely broke even and JSON output got much slower when draft acceptance collapsed.

// ANALYSIS

The takeaway is blunt: speculative decoding only pays when the draft model’s guesses are good enough, and structured-output workloads can wreck that math fast.

–Code prompts hit a 66% draft accept rate and saw a 1.53x throughput gain
–Long-form prose fell to a 31% accept rate and basically canceled out the overhead
–JSON output dropped to 8% accept rate and ran about 2x slower with MTP enabled
–The benchmark suggests a rough tipping point around 50% acceptance on this hardware/workload mix
–This matters most for local developers, where every extra pass burns scarce Apple Silicon compute

// TAGS

gemma-4mlx-vlmllmbenchmarkinferencecode-generationstructured-outputlocal-first

DISCOVERED

4h ago

2026-05-09

PUBLISHED

6h ago

2026-05-08

RELEVANCE

8/ 10

AUTHOR

Hydroskeletal

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

BENCHMARK27m ago

Hermes Agent tops OpenRouter rankings

OpenRouter's app leaderboard now puts Hermes Agent at #1, spotlighting Nous Research's open-source, persistent AI agent. The signal matters because it reflects real usage at scale, not just launch-day hype.

BENCHMARK59m ago

Qwen3-Coder-Next impresses local model users

This Reddit post is a local-inference comparison, not a formal launch writeup: the author says Qwen3-Coder-Next on MLX feels faster than their previous quickest model and produces better output than several much larger local models. The takeaway is that it may be a strong sweet spot for Apple Silicon users who want serious coding capability without paying the latency tax of giant checkpoints.

OPEN SOURCE1h ago

DeepSeek-TUI sharpens terminal coding flows

DeepSeek TUI is an open-source terminal coding agent for DeepSeek V4 that can read and edit files, run shell commands, search the web, manage git, and coordinate sub-agents. The latest release, v0.8.22, landed on May 8, 2026 and adds polish around locale handling, session behavior, Docker distribution, and install reliability.

Gemma 4 MTP hinges on acceptance