Google strips MTP heads from public Gemma 4 weights

// 47d agoMODEL RELEASE

Google strips MTP heads from public Gemma 4 weights

Google released Gemma 4 with Multi-Token Prediction (MTP) heads reserved exclusively for LiteRT runtimes, leaving public Hugging Face weights limited to standard autoregressive inference. This discovery has sparked debate over "open-washing" as the highest-performance version remains locked behind Google's proprietary ecosystem.

// ANALYSIS

Google's decision to decouple MTP heads from public weights is a strategic move that prioritizes ecosystem control over true open-source parity. MTP enables 1.5x-2.0x faster inference through built-in speculative decoding, a major advantage for on-device AI. Stripping these heads from Hugging Face weights ensures that developers seeking maximum performance must use Google's LiteRT framework. While Google cites compatibility as the reason, the move creates a two-tier system that hinders third-party optimization in tools like llama.cpp. The community is already working on reverse-engineering the LiteRT models to stitch MTP support back into standard formats.

// TAGS

gemma-4google-deepmindopen-weightsllmliterton-device-aiinferencemtp

DISCOVERED

47d ago

2026-04-10

PUBLISHED

47d ago

2026-04-10

RELEVANCE

9/ 10

AUTHOR

FunSignificance4405

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS27m ago

ElevenLabs, Greece partner on voice AI gov services

ElevenLabs signed a Memorandum of Understanding with the Greek government to integrate voice AI into the gov.gr portal, automate public service call centers, and preserve regional dialects like Cretan. The initiative aims to modernize bureaucracy and tourism through natural language interaction and linguistic heritage preservation.

VIDEO1h ago

Mistral Vibe wires connectors into CLI workflows

Mistral Vibe’s connector layer lets the terminal agent reach into external services from one workflow. The demo shows it reading requirements, editing code, opening a GitHub PR, and updating Linear without leaving the CLI.

NEWS3h ago

Dev lets Claude trade BTC overnight, nets $95 profit

A developer gave Claude a $20 budget to autonomously script and execute Bitcoin trades overnight, waking up to a functional trading bot and a $95 profit across five trades.