BACK_TO_FEEDAICRIER_2
Mythos Preview hits 93.9% SWE-bench, remains restricted
OPEN_SOURCE ↗
YT · YOUTUBE// 1h agoMODEL RELEASE

Mythos Preview hits 93.9% SWE-bench, remains restricted

Anthropic's restricted "super-frontier" model outperforms Opus 4.7 across all benchmarks, setting a new record of 93.9% on SWE-bench Verified. The model is currently limited to defensive cybersecurity partners in Project Glasswing due to its high capability for autonomous zero-day discovery and exploitation.

// ANALYSIS

Anthropic is building a "Manhattan Project" for cybersecurity, prioritizing infrastructure defense over general accessibility.

  • The 13-point jump on SWE-bench Verified signals a massive leap in reasoning and autonomous software engineering.
  • Autonomous discovery of 27-year-old vulnerabilities makes this model a high-risk asset that could weaponize hacking if leaked.
  • Project Glasswing’s $100M in credits and $4M in donations aim to secure global software before offensive models catch up.
  • A 100% score on Cybench marks the end of existing security benchmarks, requiring a total overhaul of AI evals.
// TAGS
llmai-codingagentreasoningbenchmarksafetyresearchclaude-mythos-preview

DISCOVERED

1h ago

2026-04-16

PUBLISHED

1h ago

2026-04-16

RELEVANCE

9/ 10

AUTHOR

Bijan Bowen