OPEN_SOURCE ↗
HN · HACKER_NEWS// 17d agoPRODUCT LAUNCH
ARC-AGI-3 debuts interactive reasoning benchmark
ARC Prize launches ARC-AGI-3, an interactive reasoning benchmark for AI agents that must explore, learn, plan, and adapt without instructions. It ships with 1,000+ levels across 150+ environments and a developer toolkit for local, online, or API-based testing, with scores tied to relative human action efficiency.
// ANALYSIS
ARC-AGI-3 is a better proxy for agentic intelligence than another static puzzle leaderboard, because it measures learning-in-motion instead of final-answer polish. The real challenge now is not just solving the benchmark, but keeping it novel enough that models cannot train around it.
- –Relative Human Action Efficiency is the right metric here: it rewards efficient behavior, not just completion.
- –The second-best human baseline and per-level score cap make shortcut farming and speedrun exploits less valuable.
- –Replayable runs and the toolkit should accelerate RL, tool-use, and planning experiments across the ecosystem.
- –If public environments leak into training data, ARC-AGI-3 will need fresh private surfaces fast.
// TAGS
benchmarkresearchagentreasoningtestingsdkarc-agi-3
DISCOVERED
17d ago
2026-03-25
PUBLISHED
17d ago
2026-03-25
RELEVANCE
9/ 10
AUTHOR
lairv