Microsoft MDASH tops CyberGym with 100+ agents
Microsoft's Multi-model Agentic Scanning Harness (MDASH) coordinates over 100 specialized AI agents to automate the discovery and verification of software vulnerabilities. The system currently leads the UC Berkeley CyberGym benchmark with an 88.45% success rate, significantly outperforming Anthropic’s Mythos and OpenAI’s GPT-5.5 in generating functional proof-of-concept exploits.
MDASH marks a definitive shift from general-purpose reasoning to orchestrated multi-agent ensembles for high-stakes security engineering. By implementing a formal "debate and prove" architecture, Microsoft has successfully moved autonomous security tools from simple pattern matching to verifiable vulnerability research.
- –Employs a tiered pipeline of auditor, debater, and prover agents to eliminate false positives and confirm exploitability.
- –Leverages a hybrid model strategy, combining the reasoning depth of frontier models with the speed of task-specific distilled models.
- –Demonstrated real-world efficacy by identifying 16 new Windows vulnerabilities, including 4 critical kernel remote code execution flaws.
- –The 88.45% CyberGym score establishes a new state-of-the-art for offensive security agents in open-source and enterprise codebases.
DISCOVERED
2h ago
2026-05-15
PUBLISHED
2h ago
2026-05-15
RELEVANCE
AUTHOR
Wes Roth