REDDIT · REDDIT// 3h agoBENCHMARK RESULT

DeepSeek-V4 Claims Spark SoTA Debate

DeepSeek’s April 24, 2026 V4 preview pairs open weights with a 1M-token context window and strong agentic coding scores, but its own tables show mixed results against Opus 4.6 Max and GPT-5.4 xHigh. That makes it a real frontier contender, not an unambiguous across-the-board state-of-the-art winner.

// ANALYSIS

The “SoTA” argument is only true if you narrow the frame to specific agentic and coding tasks; across broader reasoning benchmarks, the picture is much less clean.

–Official tables show V4-Pro-Max near the top on agent work: SWE Verified is 80.6 vs Opus 4.6 Max at 80.8, and MCPAtlas Public is 73.6 vs 73.8.
–On general knowledge and hard reasoning, it trails in several spots: MMLU-Pro, GPQA, and HLE do not show a blanket win over the best closed models.
–DeepSeek’s Chinese release notes explicitly say V4 is close to Opus 4.6 in non-thinking mode, but still behind Opus 4.6 thinking mode.
–The bigger story is economics and usability: open weights, 1M context, and lower compute/memory cost are the real differentiators.
–So the right read is “best open model in some important regimes,” not “new universal SoTA.”

// TAGS

deepseek-v4llmbenchmarkreasoningagentopen-source

DISCOVERED

3h ago

2026-04-28

PUBLISHED

5h ago

2026-04-28

RELEVANCE

9/ 10

AUTHOR

Perfect-Flounder7856