Gemini 3.5 Flash tops Zapier AutomationBench
Google's Gemini 3.5 Flash (Medium) has claimed the #1 spot on Zapier's AutomationBench, outperforming frontier models like GPT-5.5 at a fraction of the cost. The model achieved a 14.5% success rate on complex business workflows, costing just $0.87 per task compared to over $6 for its nearest competitors.
Google is winning the "commodity intelligence" race by proving that specialized reasoning doesn't require massive compute costs.
- –Beating flagship models like GPT-5.5 (12.9%) on end-to-end agentic tasks signals a major shift toward high-efficiency, small-footprint models for production automation.
- –The new "thinking levels" architecture allows developers to dynamically scale reasoning depth (Minimal to High), optimizing for the specific difficulty of a task rather than using a one-size-fits-all model.
- –Strong performance in "terminal-bench" (76.2%) and tool-use metrics confirms that Gemini 3.5 Flash is specifically tuned for the "agentic workhorse" role in enterprise pipelines.
- –Deterministic scoring on real-world Zapier data validates the model's reliability in handling noisy, multi-app environments that traditional benchmarks often miss.
- –At $0.87 per task, agentic automation finally moves from "expensive experiment" to "economically viable" for mid-market business operations.
DISCOVERED
3h ago
2026-05-21
PUBLISHED
5h ago
2026-05-21
RELEVANCE
AUTHOR
Independent-Wind4462