Megaplan harness edges Opus on SWE-bench

// 54d agoBENCHMARK RESULT

Megaplan harness edges Opus on SWE-bench

Megaplan is a general-purpose planning and execution harness for LLMs, and its live SWE-bench dashboard shows open-weight models using the harness ahead of Claude Opus 4.5 on the benchmark line. At the time I checked, the experiment had 26 of 500 tasks scored, with 21 passes for an 80.8% pass rate.

// ANALYSIS

This is a harness story more than a model story: the claim is that structured planning, critique, gating, and review can unlock much better coding performance from open models than one-shot execution.

–The live setup uses GLM-5.1 for prep, plan, execute, and review, with MiniMax-M2.7-highspeed handling critique and review, which is a concrete example of phase-specialized orchestration
–The repo frames Megaplan as a reusable workflow layer, not a one-off benchmark script, which makes the result more interesting for agent builders than for raw model rankings
–The result is still early and noisy: 26 scored tasks is a small slice of SWE-bench Verified, so the lead could move as the remaining 474 tasks come in
–The fact that all code and data are public makes this unusually replicable for a leaderboard claim, which should help separate signal from hype
–If the curve holds, this strengthens the case that better agent scaffolding can matter as much as marginal model gains on software engineering tasks

// TAGS

megaplanhermes-megaplanai-codingagentbenchmarkopen-source

DISCOVERED

54d ago

2026-04-04

PUBLISHED

54d ago

2026-04-04

RELEVANCE

9/ 10

AUTHOR

PetersOdyssey

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL26m ago

Anthropic drops Opus 4.8 for Claude Code

Anthropic has released Opus 4.8, integrating the new model into Claude Code with high-effort defaults for complex coding tasks. The update boosts SWE-bench Pro scores to 69.2% and drastically reduces unremarked flaws in generated code.

VIDEO27m ago

Google AI animates cardboard TPUs for I/O 2026

Google AI partners with director Laurie Rowan and Nexus Studios to create a promotional short film for Google I/O 2026. The project leverages AI models to animate physical materials like cardboard and markers into characters representing Tensor Processing Units.

MODEL27m ago

Claude Opus 4.8 drops with extended agentic autonomy

Anthropic has released Claude Opus 4.8, bringing improvements to agentic skills, reasoning, and coding capabilities at the exact same price. The update introduces sharper judgment, increased honesty about its task progress, and the ability to operate autonomously for much longer periods.