REDDIT · REDDIT// 4h agoBENCHMARK RESULT

ARC-AGI-3 sets elite human reasoning baseline

The ARC Prize Foundation has updated its reasoning benchmark to anchor the 100% human baseline to "near-elite" action efficiency. The new RHAE scoring system rigorously penalizes inefficient AI solutions, leaving current frontier models with scores below 1%.

// ANALYSIS

The shift from binary success to action efficiency represents a brutal raising of the bar for reasoning models.

–The 100% baseline is now defined by the second-best first-run human performance, effectively requiring AI to match top-tier human intuition.
–RHAE (Relative Human Action Efficiency) applies a squared penalty for excessive steps, causing scores to drop precipitously for "brute-force" or inefficient solvers.
–Frontier models like Gemini 3.1 and GPT-5 currently struggle to break 1%, highlighting the persistent "fluid intelligence" gap in LLMs.
–While community "harnesses" have reached 36% by providing better context, official "text-in, text-out" evaluation remains the gold standard for pure reasoning.
–This update signals a pivot in the AGI debate: it's no longer just about solving the puzzle, but solving it with the same minimal priors humans use.

// TAGS

arc-agi-3arc-agireasoningbenchmarkllmresearch

DISCOVERED

4h ago

2026-04-15

PUBLISHED

5h ago

2026-04-14

RELEVANCE

9/ 10

AUTHOR

exordin26