ARC-AGI-3 shifts to interactive AGI benchmarking
ARC-AGI 3 evolves the Abstraction and Reasoning Corpus into an interactive benchmark measuring agentic exploration and action efficiency. The update targets the "intelligence gap" in long-horizon planning where current LLMs significantly lag behind human performance.
ARC-AGI-3 marks the end of "static puzzle" evals, forcing AI to prove reasoning through interactive hypothesis testing and environment exploration. The new interactive mode requires agents to infer rules from sparse feedback, while a new "action efficiency" metric exposes a massive 8x performance gap between humans and top AI systems. This focus on program synthesis and long-horizon planning addresses primary failure modes of current autoregressive models, remaining a credible proxy for progress toward general intelligence.
DISCOVERED
14d ago
2026-03-29
PUBLISHED
14d ago
2026-03-29
RELEVANCE
AUTHOR
AI Search