Microsoft MDASH tops CyberGym with 100+ agents

// 45d agoBENCHMARK RESULT

Microsoft MDASH tops CyberGym with 100+ agents

Microsoft's Multi-model Agentic Scanning Harness (MDASH) coordinates over 100 specialized AI agents to automate the discovery and verification of software vulnerabilities. The system currently leads the UC Berkeley CyberGym benchmark with an 88.45% success rate, significantly outperforming Anthropic’s Mythos and OpenAI’s GPT-5.5 in generating functional proof-of-concept exploits.

// ANALYSIS

MDASH marks a definitive shift from general-purpose reasoning to orchestrated multi-agent ensembles for high-stakes security engineering. By implementing a formal "debate and prove" architecture, Microsoft has successfully moved autonomous security tools from simple pattern matching to verifiable vulnerability research.

–Employs a tiered pipeline of auditor, debater, and prover agents to eliminate false positives and confirm exploitability.
–Leverages a hybrid model strategy, combining the reasoning depth of frontier models with the speed of task-specific distilled models.
–Demonstrated real-world efficacy by identifying 16 new Windows vulnerabilities, including 4 critical kernel remote code execution flaws.
–The 88.45% CyberGym score establishes a new state-of-the-art for offensive security agents in open-source and enterprise codebases.

// TAGS

microsoft-mdashagentsecuritybenchmarkllmevaluationtool-use

DISCOVERED

45d ago

2026-05-15

PUBLISHED

45d ago

2026-05-15

RELEVANCE

9/ 10

AUTHOR

Wes Roth

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE1h ago

llm-d orchestrates Kubernetes LLM inference

llm-d is a Kubernetes-native orchestration framework for distributed and disaggregated LLM inference serving on top of engines like vLLM and SGLang. By integrating with the Kubernetes Gateway API (Inference Extension), llm-d provides prefix-cache-aware routing, tiered KV-cache offloading, disaggregated prefill/decode serving, and SLO-aware autoscaling based on queue demand.

NEWS2h ago

xAI to release new model every month

Elon Musk has announced that xAI plans to release a brand-new AI model every month for the remainder of the year, signaling a pivot toward rapid, continuous iteration. Leveraging infrastructure and feedback from SpaceX and Starlink, this monthly roadmap aims to accelerate the deployment of trained-from-scratch models.

NEWS3h ago

GPT-5.6 Leads Polymarket Top AI Race

OpenAI's GPT-5.6 leads the Polymarket prediction race for the top AI model by June 30, with Sakana AI's newly launched Fugu platform emerging as a wildcard challenger. While OpenAI remains the frontrunner, rapid multi-agent developments and infrastructure upgrades continue to shift trader expectations before the deadline.