Gemma 4 31B tops GPQA Diamond

// 55d agoBENCHMARK RESULT

Gemma 4 31B tops GPQA Diamond

Google’s Gemma 4 31B dense model is drawing attention for a community benchmark claim of 85.7% on GPQA Diamond, nearly matching Qwen3.5 27B while using fewer output tokens. Google’s launch also positions it as a single-H100, 256K-context, multimodal open model family.

// ANALYSIS

The interesting part here is not just the score, but the implied efficiency curve: if the benchmark holds up, Gemma 4 is squeezing near-frontier reasoning into a much more deployable footprint.

–Google’s official launch says the 31B dense model fits on a single 80GB H100, which makes this feel less like lab bragging and more like something teams can actually run.
–The Reddit post’s token-efficiency claim is the real differentiator: similar benchmark performance with fewer output tokens suggests lower inference cost per useful answer.
–Gemma 4’s 256K context, multimodal input, and native function-calling make it more than a chat model; it’s clearly aimed at agentic workflows and local developer tooling.
–The caution flag is provenance: this specific Qwen comparison is a community benchmark claim, not an official Google benchmark, so it should be treated as promising but not definitive.
–Still, Apache 2.0 plus open weights means adoption friction is low, which is exactly what the open-model ecosystem needs right now.

// TAGS

gemma-4llmreasoningmultimodalopen-weightsbenchmarkgpu

DISCOVERED

55d ago

2026-04-03

PUBLISHED

55d ago

2026-04-03

RELEVANCE

10/ 10

AUTHOR

Pascal22_

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE45m ago

Supabase Auth opens Passkeys public beta

Supabase has opened the Passkeys public beta to all projects, enabling passwordless, phishing-resistant logins via biometrics and hardware keys. Built on the WebAuthn standard, the feature supports discoverable credentials for a "username-less" sign-in experience.

INFRA49m ago

Hippocratic AI hits 99.9% safety on NVIDIA Blackwell

Hippocratic AI achieved 99.9% clinical safety and a 2x prefill speedup using DigitalOcean’s NVIDIA Blackwell-powered AI-Native Cloud. The collaboration demonstrates the real-world performance gains of the HGX B300 for high-concurrency, safety-critical medical agents.

NEWS51m ago

Microsoft debuts homegrown AI coding models

Microsoft is unveiling a suite of in-house AI models at next week's Build conference, led by a new coding model designed to power GitHub Copilot and reduce reliance on OpenAI.