Gemini 3.1 Pro matches top reasoning score
Google’s Gemini 3.1 Pro is a major reasoning-focused model release aimed at complex, multi-step work, with strong published gains on benchmarks like ARC-AGI-2 and Humanity’s Last Exam. In Discover AI’s comparison, it stands out as a rare model that reaches top-tier reasoning results while staying legible as a pure-language thinker instead of leaning on code-heavy tool use.
This looks like Google’s strongest argument yet that raw reasoning quality still matters more than flashy agent scaffolding. If a model can stay competitive without disappearing behind hidden tools, developers get something more debuggable, more trustworthy, and easier to slot into real workflows.
- –DeepMind reports a big jump on ARC-AGI-2, with Gemini 3.1 Pro at 77.1% versus 31.1% for Gemini 3 Pro, which is the kind of leap that gets the whole reasoning race to recalibrate
- –On Humanity’s Last Exam, Google shows 44.4% with no tools and 51.4% with search plus code, which supports the video’s framing that the base model is unusually strong even before tool augmentation
- –The model ships with a 1M-token input window, 64k output, multimodal input support, and tool hooks like function calling, structured output, search, and code execution
- –Availability across Gemini App, Vertex AI, Google AI Studio, and Gemini API makes it easier for teams to test the same model across consumer and production contexts
- –The main caveat is maturity: it is still labeled preview, so the next step for developers is less hype and more careful evals against Claude and GPT on their own long-context and coding workloads
DISCOVERED
36d ago
2026-03-06
PUBLISHED
36d ago
2026-03-06
RELEVANCE
AUTHOR
Discover AI