OPEN_SOURCE ↗
REDDIT · REDDIT// 5d agoNEWS
Claude Code flunks Elden Ring test
A Reddit post uses a failed attempt to get Claude Code running on Opus 4.6 through Elden Ring as a reality check on AGI hype. The poster argues that if a model cannot reliably handle a common game task without heavy scaffolding, claims that we are already at AGI are premature.
// ANALYSIS
The hot take is simple: benchmark wins and demos can still mask a large gap between impressive coding ability and robust general-purpose autonomy.
- –Anthropic markets Opus 4.6 as a strong agentic coding model, but this kind of anecdote shows how fragile current systems can be outside their comfort zone.
- –Elden Ring is a harsh test of perception, planning, and fast control loops, which exposes the limits of text-first agents more clearly than code benchmarks do.
- –The post is not a rigorous eval, but it is a useful signal of public skepticism around “AGI” claims that outpace everyday reliability.
- –For developers, the practical read is to treat Claude Code as a powerful assistant for bounded tasks, not as a drop-in general intelligence.
// TAGS
claude-codellmreasoningagentcomputer-usebenchmark
DISCOVERED
5d ago
2026-04-06
PUBLISHED
5d ago
2026-04-06
RELEVANCE
8/ 10
AUTHOR
CrimsonShikabane