OPEN_SOURCE ↗
REDDIT · REDDIT// 3h agoBENCHMARK RESULT
DeepSeek-V4 Claims Spark SoTA Debate
DeepSeek’s April 24, 2026 V4 preview pairs open weights with a 1M-token context window and strong agentic coding scores, but its own tables show mixed results against Opus 4.6 Max and GPT-5.4 xHigh. That makes it a real frontier contender, not an unambiguous across-the-board state-of-the-art winner.
// ANALYSIS
The “SoTA” argument is only true if you narrow the frame to specific agentic and coding tasks; across broader reasoning benchmarks, the picture is much less clean.
- –Official tables show V4-Pro-Max near the top on agent work: SWE Verified is 80.6 vs Opus 4.6 Max at 80.8, and MCPAtlas Public is 73.6 vs 73.8.
- –On general knowledge and hard reasoning, it trails in several spots: MMLU-Pro, GPQA, and HLE do not show a blanket win over the best closed models.
- –DeepSeek’s Chinese release notes explicitly say V4 is close to Opus 4.6 in non-thinking mode, but still behind Opus 4.6 thinking mode.
- –The bigger story is economics and usability: open weights, 1M context, and lower compute/memory cost are the real differentiators.
- –So the right read is “best open model in some important regimes,” not “new universal SoTA.”
// TAGS
deepseek-v4llmbenchmarkreasoningagentopen-source
DISCOVERED
3h ago
2026-04-28
PUBLISHED
5h ago
2026-04-28
RELEVANCE
9/ 10
AUTHOR
Perfect-Flounder7856