OPEN_SOURCE ↗
YT · YOUTUBE// 1h agoMODEL RELEASE
Mythos Preview hits 93.9% SWE-bench, remains restricted
Anthropic's restricted "super-frontier" model outperforms Opus 4.7 across all benchmarks, setting a new record of 93.9% on SWE-bench Verified. The model is currently limited to defensive cybersecurity partners in Project Glasswing due to its high capability for autonomous zero-day discovery and exploitation.
// ANALYSIS
Anthropic is building a "Manhattan Project" for cybersecurity, prioritizing infrastructure defense over general accessibility.
- –The 13-point jump on SWE-bench Verified signals a massive leap in reasoning and autonomous software engineering.
- –Autonomous discovery of 27-year-old vulnerabilities makes this model a high-risk asset that could weaponize hacking if leaked.
- –Project Glasswing’s $100M in credits and $4M in donations aim to secure global software before offensive models catch up.
- –A 100% score on Cybench marks the end of existing security benchmarks, requiring a total overhaul of AI evals.
// TAGS
llmai-codingagentreasoningbenchmarksafetyresearchclaude-mythos-preview
DISCOVERED
1h ago
2026-04-16
PUBLISHED
1h ago
2026-04-16
RELEVANCE
9/ 10
AUTHOR
Bijan Bowen