ARC-AGI-3 charts human-AI action gap

// 63d agoBENCHMARK RESULT

ARC-AGI-3 charts human-AI action gap

ARC Prize's ARC-AGI-3 benchmark uses Relative Human Action Efficiency to compare AI agents with first-time humans, so action count matters as much as task success. The chart makes the lesson obvious: on novel environments, brute force looks a lot less intelligent than efficient adaptation.

// ANALYSIS

This is a more honest AGI yardstick than static accuracy benchmarks, because it prices in the cost of learning, not just the final outcome.

–ARC Prize uses the 2nd-best first-time human as the baseline, trimming outliers while keeping scoring grounded in real play. [Methodology](https://docs.arcprize.org/methodology)
–The squared ratio means inefficiency compounds fast; a model that needs twice the actions earns only a quarter of the level score.
–The benchmark caps per-level credit at human speed, keeping the game focused on generalization instead of quirky shortcuts or level-specific hacks. [ARC-AGI-3](https://arcprize.org/arc-agi/3)
–The preview blog's charts reinforce the intuition: humans tend to converge on efficient paths quickly, while many agents still wander. [Preview learnings](https://arcprize.org/blog/arc-agi-3-preview-30-day-learnings)

// TAGS

arc-agi-3benchmarkagentreasoningresearch

DISCOVERED

63d ago

2026-03-26

PUBLISHED

63d ago

2026-03-25

RELEVANCE

9/ 10

AUTHOR

Stabile_Feldmaus

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE18m ago

Claude Code adds automated fixes, persistent model defaults

Claude Code v2.1.153 introduces `/code-review --fix` to automatically apply suggested improvements and persists model selections as defaults. The update also ships critical security patches for OAuth credentials and resolves major memory leaks for long-running sessions.

NEWS38m ago

Midjourney founder: diffusion wins as FLOPS outpace memory

David Holz argues that diffusion models are the superior long-term architecture because they scale with cheap compute (FLOPS) while autoregressive models remain bottlenecked by expensive memory bandwidth.

UPDATE40m ago

MotionSites prompts enable premium AI-generated landing pages

MotionSites provides a curated library of high-fidelity design prompts for AI web builders like Lovable and Bolt.new. Its "Reverie" template showcases immersive 3D motion and interactive layouts designed for premium SaaS and exhibition sites.