Claude Code flunks Elden Ring test
A Reddit post uses a failed attempt to get Claude Code running on Opus 4.6 through Elden Ring as a reality check on AGI hype. The poster argues that if a model cannot reliably handle a common game task without heavy scaffolding, claims that we are already at AGI are premature.
The hot take is simple: benchmark wins and demos can still mask a large gap between impressive coding ability and robust general-purpose autonomy.
- –Anthropic markets Opus 4.6 as a strong agentic coding model, but this kind of anecdote shows how fragile current systems can be outside their comfort zone.
- –Elden Ring is a harsh test of perception, planning, and fast control loops, which exposes the limits of text-first agents more clearly than code benchmarks do.
- –The post is not a rigorous eval, but it is a useful signal of public skepticism around “AGI” claims that outpace everyday reliability.
- –For developers, the practical read is to treat Claude Code as a powerful assistant for bounded tasks, not as a drop-in general intelligence.
DISCOVERED
51d ago
2026-04-06
PUBLISHED
51d ago
2026-04-06
RELEVANCE
AUTHOR
CrimsonShikabane