Claude Mythos Preview clears METR time-horizon ceiling

// 45d agoBENCHMARK RESULT

Claude Mythos Preview clears METR time-horizon ceiling

Anthropic says an early Claude Mythos Preview snapshot given to METR posts a time horizon more than 2x the next-best model. METR also notes its current suite gets unreliable above 16 hours, so the exact number is less important than the size of the gap.

// ANALYSIS

This reads like a genuine capability jump, but the headline is the relative lead, not the absolute hour count.

–METR’s own ceiling means the benchmark is now compressing at the top end, which is usually where frontier-model comparisons get noisy
–The signal that matters is longer autonomous task completion, which tends to correlate with better multi-step coding, research, and tool use
–Because this is an early snapshot, the number may shift as Anthropic iterates or METR expands the task suite
–If the gap holds, Mythos Preview is not just ahead on scorecards, it is ahead in the kind of long-horizon work that defines agentic systems

// TAGS

claude-mythos-previewllmreasoningbenchmarkevaluationagent

DISCOVERED

45d ago

2026-05-10

PUBLISHED

45d ago

2026-05-10

RELEVANCE

9/ 10

AUTHOR

noahzweben

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

RESEARCH38m ago

OpenAI details RL alignment generalization

OpenAI's latest alignment research demonstrates that training AI models on beneficial traits in a single domain, like healthcare, generalizes to completely unrelated tasks. This reinforcement learning approach improves performance on 80% of out-of-distribution safety benchmarks and increases resistance to adversarial jailbreaking.

INFRA5h ago

PostHog SQL parser hits 70x speedup

PostHog has replaced its ANTLR-based C++ SQL parser with a hand-rolled Rust implementation written entirely by Claude Code. The new parser is 70x faster on local benchmarks and up to 454x faster in production, verified through property-based testing and shadow deployments.

UPDATE6h ago

Gemini 3.5 Flash adds computer use

Google has natively integrated computer use capabilities into Gemini 3.5 Flash, allowing developers to build custom agents that can see, reason, and act across desktop, mobile, and browser environments. The feature is available via the Gemini API and Gemini Enterprise Agent Platform, supported by new enterprise safety safeguards.