METR pushes Claude Mythos Preview past 16 hours

// 45d agoBENCHMARK RESULT

METR pushes Claude Mythos Preview past 16 hours

METR says it evaluated an early version of Claude Mythos Preview during a limited window in March 2026 and, on its task suite, estimated a 50%-time-horizon of at least 16 hours, with a 95% confidence interval from 8.5 to 55 hours. METR also cautions that its current suite has too few 16+ hour tasks to make that range statistically robust, so it is treating the estimate as a floor rather than a precise comparison point. The update was reflected on METR’s time-horizons page on May 8, 2026.

// ANALYSIS

Big signal, but not a clean leaderboard win.

–The main takeaway is that Claude Mythos Preview appears to sit at the top end of what METR can currently measure, which is notable even if the estimate is intentionally conservative.
–METR is explicitly warning against over-reading the number: only 5 of 228 tasks are estimated at 16+ hours, so the curve is sparse at the upper tail.
–This reads more like a measurement-limit story than a crisp benchmark breakthrough; the methodology itself is saying, “we need longer tasks before we can rank models above this confidently.”
–For anyone comparing frontier models, the more important detail is that this is an early preview and a lower-bound style result, not a finished product release with a stable public benchmark claim.

// TAGS

metranthropicclaudemythosbenchmarkevaluationllmtime-horizon

DISCOVERED

45d ago

2026-05-09

PUBLISHED

45d ago

2026-05-09

RELEVANCE

8/ 10

AUTHOR

RavingMalwaay

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS29m ago

Cursor cloud agents verify tasks with video

Developers are leveraging Cursor's remote cloud agents to execute complex coding tasks in isolated virtual machines. By requiring the agent to produce a video recording of the working feature before stopping, users can verify successful runs without manual code inspection.

MODEL56m ago

ByteDance to launch Seedance 2.5 video generator

ByteDance has announced Seedance 2.5, an upcoming AI video generation model set to launch in early July 2026. The new version will support 30-second single-shot video generation and 50 input reference assets, while the current Seedance 2.0 has been upgraded with native 4K output.

UPDATE1h ago

huggingface_hub automates weekly releases with AI

Hugging Face transitioned the huggingface_hub Python library to a weekly release cadence using a single GitHub Actions workflow. The automated pipeline uses open-weights language models to draft release notes, which are verified deterministically against squash-merge commits and approved by a human maintainer.