OPEN_SOURCE ↗
REDDIT · REDDIT// 17d agoBENCHMARK RESULT
ARC-AGI-3 charts human-AI action gap
ARC Prize's ARC-AGI-3 benchmark uses Relative Human Action Efficiency to compare AI agents with first-time humans, so action count matters as much as task success. The chart makes the lesson obvious: on novel environments, brute force looks a lot less intelligent than efficient adaptation.
// ANALYSIS
This is a more honest AGI yardstick than static accuracy benchmarks, because it prices in the cost of learning, not just the final outcome.
- –ARC Prize uses the 2nd-best first-time human as the baseline, trimming outliers while keeping scoring grounded in real play. [Methodology](https://docs.arcprize.org/methodology)
- –The squared ratio means inefficiency compounds fast; a model that needs twice the actions earns only a quarter of the level score.
- –The benchmark caps per-level credit at human speed, keeping the game focused on generalization instead of quirky shortcuts or level-specific hacks. [ARC-AGI-3](https://arcprize.org/arc-agi/3)
- –The preview blog's charts reinforce the intuition: humans tend to converge on efficient paths quickly, while many agents still wander. [Preview learnings](https://arcprize.org/blog/arc-agi-3-preview-30-day-learnings)
// TAGS
arc-agi-3benchmarkagentreasoningresearch
DISCOVERED
17d ago
2026-03-26
PUBLISHED
17d ago
2026-03-25
RELEVANCE
9/ 10
AUTHOR
Stabile_Feldmaus