Qwen 3.7 Max hits 60.6% on SWE-Bench Pro

// 45d agoMODEL RELEASE

Qwen 3.7 Max hits 60.6% on SWE-Bench Pro

Alibaba Cloud's new flagship Qwen 3.7 Max claims the top spot on the SWE-Bench Pro leaderboard with a record 60.6% score. Designed specifically for the "agent era," the model features a mandatory thinking mode for planning and verifying complex, multi-step engineering tasks.

// ANALYSIS

Qwen 3.7 Max signals a decisive move toward "agent foundation" models that prioritize long-horizon reasoning over simple chat.

–The 60.6% SWE-Bench Pro score validates its superior ability to handle multi-file repository maintenance and real-world software issues autonomously.
–Native MCP support and "Thinking Mode" enable it to sustain reasoning across thousands of tool calls, as proven by a 35-hour autonomous kernel optimization run.
–Drop-in compatibility with OpenAI and Anthropic SDKs lowers the barrier for developers to swap it into existing agentic workflows.
–The focus on closed-weights for the "Max" series marks a strategic shift for Alibaba as it competes directly with GPT-5.5 and Claude 4.6 for enterprise dominance.

// TAGS

qwen-3-7-maxalibaballmreasoningai-codingagentbenchmarkmcp

DISCOVERED

45d ago

2026-05-21

PUBLISHED

45d ago

2026-05-21

RELEVANCE

10/ 10

AUTHOR

Able-Necessary-6048

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE43m ago

Anthropic introduces Claude Design 2.0 visual prototyping workspace

Claude Design 2.0 is Anthropic's visual canvas environment for design exploration, prototyping, and asset synchronization. The tool allows users to transform text prompts, images, and documents into interactive designs and features seamless integration with Claude Code to streamline the transition from design to development.

VIDEO43m ago

Matt Maher Launches CARE AI Agent Benchmark

Matt Maher evaluates leading AI models like GPT-5.5 and Claude Opus 4.8 using the CARE benchmark to measure how successfully AI coding agents maintain user intent during planning and execution. While top-tier models create excellent initial plans, they frequently lose track of specific user instructions during execution, with specialized long-horizon modes preserving intent best.

OPEN SOURCE1h ago

planning-with-files provides persistent, file-based markdown planning and completion gating to help AI coding agents survive context loss and handle long-running tasks.

planning-with-files is an open-source persistent file-based planning system designed for AI coding agents and long-running tasks. It works across over 60 agents (including Claude Code, Codex, and Cursor) by storing durable Markdown files—specifically task_plan.md, findings.md, and progress.md—directly on disk, making the agent's memory and plan crash-proof against context loss or command-line clears. Its recent update introduces opt-in autonomous and gated modes featuring a deterministic completion gate that prevents the agent from finishing until all planned tasks are fully resolved, mimicking Manus-style workflow persistence.

Qwen 3.7 Max hits 60.6% on SWE-Bench Pro