GLM-5.1 tops SWE-Bench Pro
Z.ai says GLM-5.1 hit 58.4 on SWE-Bench Pro, edging Opus 4.6 at 57.3, GPT-5.4 at 57.7, and Gemini 3.1 Pro at 54.2. It’s a notable agentic-coding signal for a model family that’s been closing the gap with the frontier fast.
This is a real win, but it’s a narrow one: SWE-Bench Pro is exactly the kind of repo-level benchmark that matters for coding agents, yet the margin over the top proprietary models is still slim enough to treat this as a checkpoint, not a coronation. SWE-Bench Pro is more meaningful than toy coding tests because it stresses end-to-end issue fixing, and beating Opus 4.6, GPT-5.4, and Gemini 3.1 Pro on one benchmark puts GLM-5.1 in the same conversation as the frontier. The spread here is small, so the next question is consistency across other agentic evals. If Z.ai can pair this with low cost, it becomes a serious pressure point on closed model pricing for coding workflows, highlighting a category convergence where open-weight models can now flip leadership on specific tasks.
DISCOVERED
4d ago
2026-04-07
PUBLISHED
4d ago
2026-04-07
RELEVANCE
AUTHOR
Able-Necessary-6048