BACK_TO_FEEDAICRIER_2
Google strips MTP heads from public Gemma 4 weights
OPEN_SOURCE ↗
REDDIT · REDDIT// 1d agoMODEL RELEASE

Google strips MTP heads from public Gemma 4 weights

Google released Gemma 4 with Multi-Token Prediction (MTP) heads reserved exclusively for LiteRT runtimes, leaving public Hugging Face weights limited to standard autoregressive inference. This discovery has sparked debate over "open-washing" as the highest-performance version remains locked behind Google's proprietary ecosystem.

// ANALYSIS

Google's decision to decouple MTP heads from public weights is a strategic move that prioritizes ecosystem control over true open-source parity. MTP enables 1.5x-2.0x faster inference through built-in speculative decoding, a major advantage for on-device AI. Stripping these heads from Hugging Face weights ensures that developers seeking maximum performance must use Google's LiteRT framework. While Google cites compatibility as the reason, the move creates a two-tier system that hinders third-party optimization in tools like llama.cpp. The community is already working on reverse-engineering the LiteRT models to stitch MTP support back into standard formats.

// TAGS
gemma-4google-deepmindopen-weightsllmliterton-device-aiinferencemtp

DISCOVERED

1d ago

2026-04-10

PUBLISHED

1d ago

2026-04-10

RELEVANCE

9/ 10

AUTHOR

FunSignificance4405