Google releases Gemma 4 QAT checkpoints

// 45d agoMODEL RELEASE

Google releases Gemma 4 QAT checkpoints

Google DeepMind has released official Quantization-Aware Training (QAT) checkpoints for the Gemma 4 model family on Hugging Face, integrating model compression directly into the training process. The release includes unquantized Q4_0 checkpoints, GGUF formats, a mobile-optimized wNa8o8 schema, and compressed tensors for native vLLM inference.

// ANALYSIS

Post-training quantization is dead for high-stakes edge deployments; native QAT is now the baseline expectation for open-source LLM releases if developers want production-grade on-device performance without sacrificing accuracy.

–**PTQ is a compromise:** Traditional post-training quantization destroys critical reasoning capability, whereas QAT preserves quality by simulating precision loss during the training process.
–**Mobile-first architecture:** Introducing custom mobile-quantization schemas like wNa8o8 (with 2-bit decoding layers) shows that hardware-software co-design is essential for running larger models on mobile devices (e.g., shrinking Gemma 4 E2B down to a 1GB footprint).
–**Ecosystem readiness:** Providing multiple ready-to-run formats (GGUF, compressed tensors, and Q4_0) ensures immediate adoption across a fragmented local inference ecosystem (vLLM, Ollama, llama.cpp, LiteRT-LM).

// TAGS

gemma-4qatdeepmindquantizationopen-sourcehugging-facellmedge-aimobile-ai

DISCOVERED

45d ago

2026-06-05

PUBLISHED

45d ago

2026-06-05

RELEVANCE

8/ 10

AUTHOR

googlegemma

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

SECURITY34m ago

Fake ChatGPT Work giveaway targets users

A post on X promotes a fake OpenAI campaign offering $100 in free ChatGPT Work credits for the first 10,000 users who share their thoughts about the product. The links provided redirect to a typosquatted phishing domain (share-chatgpt-work.openai.chatgpt.site), which mimics OpenAI branding in an attempt to steal user credentials or distribute malware.

MODEL1h ago

MiniMax previews M3.1 at AI conference

MiniMax has previewed its new M3.1 model series at a Chinese AI conference, claiming significant advancements in multimodal understanding, complex reasoning, coding performance, agent reliability, and hallucination reduction.

NEWS1h ago

Vercel agent tools hit AI dataset

Chris Tate (@ctatedev), a developer at Vercel Labs, announced that his suite of open-source developer tools—agent-browser, portless, and json-render—has been included in an AI model's training dataset for the first time. These tools are specifically designed to optimize agentic workflows: agent-browser simplifies browser automation using token-efficient accessibility trees, portless resolves local port conflict issues for agents using named local URLs, and json-render provides a Generative UI framework that allows models to output validated JSON specifications instead of hallucinated frontend code.