METR pushes Claude Mythos Preview past 16 hours
METR says it evaluated an early version of Claude Mythos Preview during a limited window in March 2026 and, on its task suite, estimated a 50%-time-horizon of at least 16 hours, with a 95% confidence interval from 8.5 to 55 hours. METR also cautions that its current suite has too few 16+ hour tasks to make that range statistically robust, so it is treating the estimate as a floor rather than a precise comparison point. The update was reflected on METR’s time-horizons page on May 8, 2026.
Big signal, but not a clean leaderboard win.
- –The main takeaway is that Claude Mythos Preview appears to sit at the top end of what METR can currently measure, which is notable even if the estimate is intentionally conservative.
- –METR is explicitly warning against over-reading the number: only 5 of 228 tasks are estimated at 16+ hours, so the curve is sparse at the upper tail.
- –This reads more like a measurement-limit story than a crisp benchmark breakthrough; the methodology itself is saying, “we need longer tasks before we can rank models above this confidently.”
- –For anyone comparing frontier models, the more important detail is that this is an early preview and a lower-bound style result, not a finished product release with a stable public benchmark claim.
DISCOVERED
2h ago
2026-05-09
PUBLISHED
5h ago
2026-05-09
RELEVANCE
AUTHOR
RavingMalwaay