Claude Opus 4.8 sets ARC-AGI-3 SOTA
Anthropic has announced that its latest model, Claude Opus 4.8, has achieved a new state-of-the-art (SOTA) score of 1.5% on the ARC-AGI-3 benchmark, which measures abstract reasoning in interactive environments. The benchmark run, costing roughly $10,000, highlights both the unprecedented reasoning capability of the new model and the massive compute cost currently required to solve even a tiny fraction of ARC-AGI-3's novel tasks.
While a 1.5% score sounds tiny, achieving any positive score on the notoriously difficult ARC-AGI-3 benchmark is a major milestone for AI reasoning, though the $10,000 cost exposes the severe efficiency bottlenecks of current brute-force agentic search.
* The 1.5% score triples the previous SOTA, demonstrating Claude Opus 4.8's superior capacity for genuine abstract reasoning and dynamic problem-solving over its predecessors and competitors.
* A $10,000 compute cost for a 1.5% success rate highlights the massive gap between current LLM-based agentic architectures and human-like sample efficiency, raising questions about the commercial viability of brute-force test-time compute.
* The integration of dynamic workflows and a fast mode in Claude Code suggests that Anthropic is strategically positioning its models as autonomous, agentic assistants capable of running long-term tasks independently.
DISCOVERED
2h ago
2026-06-01
PUBLISHED
3h ago
2026-06-01
RELEVANCE
AUTHOR
fchollet