OPEN_SOURCE ↗
REDDIT · REDDIT// 37d agoBENCHMARK RESULT
GPT-5.4 pro nears Gemini on ARC-AGI-2
OpenAI's GPT-5.4 pro posted 83.3% on ARC-AGI-2 in ARC Prize reporting shared alongside the model's rollout, putting it within 1.3 points of Gemini 3.1 Pro's 84.6%. ARC Prize also listed base GPT-5.4 at 74.0%, suggesting the pro tier's extra reasoning budget is doing real work on one of the hardest abstraction benchmarks around.
// ANALYSIS
More than a victory lap, this is a sign that the frontier reasoning race is compressing into tiny single-digit gaps on benchmarks that still feel meaningfully hard.
- –The headline number matters because ARC-AGI-2 is designed to test abstraction and adaptability, not just polished benchmark memorization.
- –GPT-5.4 pro's 83.3% puts OpenAI back in striking distance of Gemini 3.1 Pro instead of clearly trailing on fluid-reasoning optics.
- –The jump from 74.0% for base GPT-5.4 to 83.3% for pro shows how much performance is now coming from extra reasoning effort, not just the base model.
- –ARC Prize attached a $16.41 per-task figure to GPT-5.4 pro, so this score is impressive but not cheap.
- –Developers should read this as a strong research signal, not a universal winner badge; real coding, agent, and tool-use workloads still matter more than a single benchmark.
// TAGS
gpt-5-4-prollmreasoningbenchmarkapi
DISCOVERED
37d ago
2026-03-06
PUBLISHED
37d ago
2026-03-05
RELEVANCE
10/ 10
AUTHOR
nsdjoe