DeepMind proof agent solves 9 Erdős problems

// 45d agoRESEARCH PAPER

DeepMind proof agent solves 9 Erdős problems

A Google DeepMind paper reports an AI-driven formal proof system that autonomously solved 9 of 353 open Erdős problems and 44 of 492 OEIS conjectures, with inference costs of a few hundred dollars per problem. The result is less about raw headline score and more about showing that LLMs plus Lean can produce machine-checkable math research at meaningful scale.

// ANALYSIS

The important shift here is not “AI beats mathematicians,” but “verification turns probabilistic reasoning into something that can actually survive contact with formal math.”

–Solving 9 of 353 open Erdős problems is still a small hit rate, but every win is formally checked, which is a much higher bar than most benchmark math claims
–The paper says a basic LLM-plus-Lean loop replicated the Erdős results, which suggests the search/verification scaffold matters as much as model intelligence
–44 solved OEIS conjectures broadens the signal beyond one math niche and points to a reusable theorem-search pattern
–The cost profile matters: a few hundred dollars per problem is the kind of number that makes this usable as a research assistant, not just a demo
–For developers, the takeaway is broader than math: any domain with a strong validator could benefit from the same agent loop

// TAGS

llmreasoningevaluationresearchbenchmarkagentgoogle-deepmind

DISCOVERED

45d ago

2026-05-26

PUBLISHED

46d ago

2026-05-24

RELEVANCE

9/ 10

AUTHOR

Independent-Wind4462

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

RESEARCH28m ago

Meta AI introduces Proactive Memory Agent

Meta AI researchers proposed a decoupled Proactive Memory Agent architecture to address behavioral state decay in long-horizon AI agents. The module runs alongside the primary agent to maintain a structured memory bank and strategically inject memory-grounded reminders, improving performance on complex benchmarks.

UPDATE33m ago

Perplexity Computer adds Claude Opus 4.8

Perplexity has integrated Anthropic's Claude Opus 4.8 in "Fast mode" within its Perplexity Computer workspace. The new tier uses optimized compute to deliver up to 2.5× faster output speeds while maintaining the model's high-quality reasoning for complex workflows.

UPDATE42m ago

Perplexity Computer adds model spend tracking

Perplexity has added an Analytics tab to Perplexity Computer settings, allowing users to track usage and spending across different AI models. The dashboard provides insights into model-specific activity and credit consumption to help manage multi-model workflow costs.