Microsoft releases OpenRCA 2.0 causal reasoning benchmark
OpenRCA 2.0 is a root cause analysis (RCA) benchmark of 500 instances designed to evaluate step-wise causal reasoning of LLM agents using the PAVE protocol. Evaluation of 11 frontier LLMs reveals they struggle with process-level reasoning, recovering the exact root-cause set in only 20.7% of cases.
Outcome-only metrics for root cause analysis are deceptive, hiding LLM hallucinations behind lucky pattern matching.
* OpenRCA 2.0 introduces the PAVE protocol to verify structural conformance, statistical deviation, and temporal alignment of fault paths.
* Frontier LLMs fail to recover the exact root cause set 79.3% of the time, highlighting that SRE automation requires much deeper causal reasoning than current models possess.
* Grounding causal paths prevents agents from recommending remediation based on incorrect causal assumptions.
DISCOVERED
1h ago
2026-06-27
PUBLISHED
1h ago
2026-06-27
RELEVANCE
AUTHOR
Discover AI