Claude Mythos Preview clears METR time-horizon ceiling
Anthropic says an early Claude Mythos Preview snapshot given to METR posts a time horizon more than 2x the next-best model. METR also notes its current suite gets unreliable above 16 hours, so the exact number is less important than the size of the gap.
This reads like a genuine capability jump, but the headline is the relative lead, not the absolute hour count.
- –METR’s own ceiling means the benchmark is now compressing at the top end, which is usually where frontier-model comparisons get noisy
- –The signal that matters is longer autonomous task completion, which tends to correlate with better multi-step coding, research, and tool use
- –Because this is an early snapshot, the number may shift as Anthropic iterates or METR expands the task suite
- –If the gap holds, Mythos Preview is not just ahead on scorecards, it is ahead in the kind of long-horizon work that defines agentic systems
DISCOVERED
1h ago
2026-05-10
PUBLISHED
2h ago
2026-05-10
RELEVANCE
AUTHOR
noahzweben
