BACK_TO_FEEDAICRIER_2
Gemini 3.1 Pro matches top reasoning score
OPEN_SOURCE ↗
YT · YOUTUBE// 36d agoMODEL RELEASE

Gemini 3.1 Pro matches top reasoning score

Google’s Gemini 3.1 Pro is a major reasoning-focused model release aimed at complex, multi-step work, with strong published gains on benchmarks like ARC-AGI-2 and Humanity’s Last Exam. In Discover AI’s comparison, it stands out as a rare model that reaches top-tier reasoning results while staying legible as a pure-language thinker instead of leaning on code-heavy tool use.

// ANALYSIS

This looks like Google’s strongest argument yet that raw reasoning quality still matters more than flashy agent scaffolding. If a model can stay competitive without disappearing behind hidden tools, developers get something more debuggable, more trustworthy, and easier to slot into real workflows.

  • DeepMind reports a big jump on ARC-AGI-2, with Gemini 3.1 Pro at 77.1% versus 31.1% for Gemini 3 Pro, which is the kind of leap that gets the whole reasoning race to recalibrate
  • On Humanity’s Last Exam, Google shows 44.4% with no tools and 51.4% with search plus code, which supports the video’s framing that the base model is unusually strong even before tool augmentation
  • The model ships with a 1M-token input window, 64k output, multimodal input support, and tool hooks like function calling, structured output, search, and code execution
  • Availability across Gemini App, Vertex AI, Google AI Studio, and Gemini API makes it easier for teams to test the same model across consumer and production contexts
  • The main caveat is maturity: it is still labeled preview, so the next step for developers is less hype and more careful evals against Claude and GPT on their own long-context and coding workloads
// TAGS
gemini-3.1-prollmreasoningbenchmarkapimultimodal

DISCOVERED

36d ago

2026-03-06

PUBLISHED

36d ago

2026-03-06

RELEVANCE

9/ 10

AUTHOR

Discover AI