Mythos Preview hits 93.9% SWE-bench, remains restricted

// 90d agoMODEL RELEASE

Mythos Preview hits 93.9% SWE-bench, remains restricted

Anthropic's restricted "super-frontier" model outperforms Opus 4.7 across all benchmarks, setting a new record of 93.9% on SWE-bench Verified. The model is currently limited to defensive cybersecurity partners in Project Glasswing due to its high capability for autonomous zero-day discovery and exploitation.

// ANALYSIS

Anthropic is building a "Manhattan Project" for cybersecurity, prioritizing infrastructure defense over general accessibility.

–The 13-point jump on SWE-bench Verified signals a massive leap in reasoning and autonomous software engineering.
–Autonomous discovery of 27-year-old vulnerabilities makes this model a high-risk asset that could weaponize hacking if leaked.
–Project Glasswing’s $100M in credits and $4M in donations aim to secure global software before offensive models catch up.
–A 100% score on Cybench marks the end of existing security benchmarks, requiring a total overhaul of AI evals.

// TAGS

llmai-codingagentreasoningbenchmarksafetyresearchclaude-mythos-preview

DISCOVERED

90d ago

2026-04-16

PUBLISHED

90d ago

2026-04-16

RELEVANCE

9/ 10

AUTHOR

Bijan Bowen

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

RESEARCH1h ago

GPT-5.6 Sol Pro disproves Benjamini-Hochberg conjecture

University of Pennsylvania professor Edgar Dobriban utilized OpenAI's GPT-5.6 Sol Pro to disprove a 30-year-old conjecture about the Benjamini-Hochberg procedure under correlated tests. Running in Pro mode, the reasoning model generated a mathematical proof and numerical certificate verifying the failure in 90 minutes.

OPEN SOURCE2h ago

Prismor launches AI agent runtime firewall

Prismor is an open-source runtime firewall and security control plane that intercepts and validates AI agent tool calls in real time. Sitting at the tool-call boundary, it enforces cryptographically signed policies and maintains detailed audit trails to prevent prompt injections, secret leaks, and unauthorized commands.

MODEL3h ago

DeepSeek V4, Kimi K3 dropping soon

The upcoming releases of DeepSeek V4 GA and Moonshot AI's Kimi K3 represent a highly anticipated next step for the Chinese AI ecosystem, with early builds of the models showing highly impressive capabilities that could replicate the impact of the DeepSeek-R1 release.