Qwen 3.7 Max hits 60.6% on SWE-Bench Pro
Alibaba Cloud's new flagship Qwen 3.7 Max claims the top spot on the SWE-Bench Pro leaderboard with a record 60.6% score. Designed specifically for the "agent era," the model features a mandatory thinking mode for planning and verifying complex, multi-step engineering tasks.
Qwen 3.7 Max signals a decisive move toward "agent foundation" models that prioritize long-horizon reasoning over simple chat.
- –The 60.6% SWE-Bench Pro score validates its superior ability to handle multi-file repository maintenance and real-world software issues autonomously.
- –Native MCP support and "Thinking Mode" enable it to sustain reasoning across thousands of tool calls, as proven by a 35-hour autonomous kernel optimization run.
- –Drop-in compatibility with OpenAI and Anthropic SDKs lowers the barrier for developers to swap it into existing agentic workflows.
- –The focus on closed-weights for the "Max" series marks a strategic shift for Alibaba as it competes directly with GPT-5.5 and Claude 4.6 for enterprise dominance.
DISCOVERED
3h ago
2026-05-21
PUBLISHED
6h ago
2026-05-21
RELEVANCE
AUTHOR
Able-Necessary-6048