Gemma 4 tops benchmarks, hallucination metrics pending
Google's Gemma 4 family, released in April 2026, sets new open-weight records in reasoning and instruction following while third-party hallucination audits remain pending. The models feature a configurable "Thinking Mode" designed to improve reliability and reduce false claims in complex agentic workflows.
Gemma 4 represents a massive leap for open-weight models, specifically targeting the hallucination and reasoning gap that plagued previous iterations. The 31B Dense model ranks as the third-best open model globally, outperforming many proprietary models in reasoning benchmarks like MMLU Pro. Thinking Mode allows the model to pause and reason through complex problems, leading to a refusal over hallucination behavior in early tests. The current lack of inclusion on the Vectara Hallucination Leaderboard creates a data vacuum that the LocalLLaMA community is actively trying to fill, while native support for system instructions and structured JSON output addresses long-standing developer pain points in agentic workflows.
DISCOVERED
7d ago
2026-04-05
PUBLISHED
7d ago
2026-04-04
RELEVANCE
AUTHOR
appakaradi