OPEN_SOURCE ↗
REDDIT · REDDIT// 16d agoBENCHMARK RESULT
Droidrun tops mobile agent benchmark
Droidrun led a 65-task AndroidWorld benchmark at 43% success, ahead of Mobile-Agent (29%), AutoDroid (14%), and AppAgent (7%). The win came with the highest token burn among the stronger agents, underscoring how expensive reliable mobile automation still is.
// ANALYSIS
The headline win matters, but the bigger story is that the best mobile agent still fails most of the time. This is less a category victory lap than a reminder that mobile automation remains brittle and state tracking, recovery, and grounding are the real moat.
- –Droidrun's explicit planning seems to buy reliability, but at a clear token premium.
- –Mobile-Agent looks like the most balanced option if teams want acceptable performance without the top-end spend.
- –AutoDroid is the budget pick, but 14% success is too low for broad deployment.
- –AppAgent's vision-heavy pipeline appears to spend a lot and still miss too much.
- –For developers, the benchmark says mobile agents are promising for narrow workflows, not yet for fully hands-off autonomy.
// TAGS
droidrunbenchmarkagentcomputer-useautomationresearch
DISCOVERED
16d ago
2026-03-26
PUBLISHED
17d ago
2026-03-26
RELEVANCE
8/ 10
AUTHOR
No-Speech12