BACK_TO_FEEDAICRIER_2
ARC-AGI-3 sets elite human reasoning baseline
OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoBENCHMARK RESULT

ARC-AGI-3 sets elite human reasoning baseline

The ARC Prize Foundation has updated its reasoning benchmark to anchor the 100% human baseline to "near-elite" action efficiency. The new RHAE scoring system rigorously penalizes inefficient AI solutions, leaving current frontier models with scores below 1%.

// ANALYSIS

The shift from binary success to action efficiency represents a brutal raising of the bar for reasoning models.

  • The 100% baseline is now defined by the second-best first-run human performance, effectively requiring AI to match top-tier human intuition.
  • RHAE (Relative Human Action Efficiency) applies a squared penalty for excessive steps, causing scores to drop precipitously for "brute-force" or inefficient solvers.
  • Frontier models like Gemini 3.1 and GPT-5 currently struggle to break 1%, highlighting the persistent "fluid intelligence" gap in LLMs.
  • While community "harnesses" have reached 36% by providing better context, official "text-in, text-out" evaluation remains the gold standard for pure reasoning.
  • This update signals a pivot in the AGI debate: it's no longer just about solving the puzzle, but solving it with the same minimal priors humans use.
// TAGS
arc-agi-3arc-agireasoningbenchmarkllmresearch

DISCOVERED

4h ago

2026-04-15

PUBLISHED

5h ago

2026-04-14

RELEVANCE

9/ 10

AUTHOR

exordin26