YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Microsoft MDASH tops CyberGym with 100+ agents

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Microsoft MDASH tops CyberGym with 100+ agents
OPEN LINK ↗
// 2h agoBENCHMARK RESULT

Microsoft MDASH tops CyberGym with 100+ agents

Microsoft's Multi-model Agentic Scanning Harness (MDASH) coordinates over 100 specialized AI agents to automate the discovery and verification of software vulnerabilities. The system currently leads the UC Berkeley CyberGym benchmark with an 88.45% success rate, significantly outperforming Anthropic’s Mythos and OpenAI’s GPT-5.5 in generating functional proof-of-concept exploits.

// ANALYSIS

MDASH marks a definitive shift from general-purpose reasoning to orchestrated multi-agent ensembles for high-stakes security engineering. By implementing a formal "debate and prove" architecture, Microsoft has successfully moved autonomous security tools from simple pattern matching to verifiable vulnerability research.

  • Employs a tiered pipeline of auditor, debater, and prover agents to eliminate false positives and confirm exploitability.
  • Leverages a hybrid model strategy, combining the reasoning depth of frontier models with the speed of task-specific distilled models.
  • Demonstrated real-world efficacy by identifying 16 new Windows vulnerabilities, including 4 critical kernel remote code execution flaws.
  • The 88.45% CyberGym score establishes a new state-of-the-art for offensive security agents in open-source and enterprise codebases.
// TAGS
microsoft-mdashagentsecuritybenchmarkllmevaluationtool-use

DISCOVERED

2h ago

2026-05-15

PUBLISHED

2h ago

2026-05-15

RELEVANCE

9/ 10

AUTHOR

Wes Roth