Claude Fable 5 tops BU Bench
Anthropic's newly released Claude Fable 5 model achieved a record-breaking performance on browser-use's BU Bench web automation benchmark, but at a high cost. While the model demonstrated unmatched capabilities in complex, multi-step online workflows, completing the benchmark run cost $580.87.
Frontier intelligence models are unlocking high-fidelity web automation, but the economics of running multi-step agentic workflows on live sites remain a major bottleneck for commercial deployment.
* Claude Fable 5's mythos-class capabilities represent a major leap in agentic web navigation, likely driven by its massive context window and advanced multi-stage reasoning.
* The $580.87 run cost for a 100-task benchmark highlights that agentic automation using state-of-the-art models is still cost-prohibitive for everyday tasks.
* The reliance on a Gemini-based judge for evaluating real-world web success shows the industry's shift toward LLM-as-a-judge for dynamic and non-deterministic tasks.
DISCOVERED
2h ago
2026-06-11
PUBLISHED
2h ago
2026-06-11
RELEVANCE
AUTHOR
browser_use