DeepSeek V4 resets open-model training stack
DeepSeek-V4 is DeepSeek’s April 24, 2026 preview release, and it is a substantial jump rather than a cosmetic refresh. The official transparency page lists V4 as a new release, and the model card describes a preview series with DeepSeek-V4-Pro at 1.6T total parameters, 49B activated, and a 1M-token context window. The report says the gains come from hybrid attention, mHC, and the Muon optimizer, plus a two-stage post-training pipeline that separates domain-specific expert cultivation from unified consolidation. Sources: https://www.deepseek.com/en/transparency/ and https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro
Hot take: this is the kind of release that actually shifts the open-source frontier, because the interesting part is the systems work, not just the benchmark numbers.
- –The headline technical move is the 1M-token context with compressed sparse/heavily compressed attention, which makes the model materially cheaper to run at long context.
- –The architecture is still MoE, but the training stack is where DeepSeek seems to be pushing hardest: mHC for stability, Muon for convergence, and a two-stage post-training recipe that should matter for downstream capability.
- –The official materials position V4 as better than V3.2 across reasoning, coding, and agentic tasks, which matches the hype around it as an open-source SOTA push.
- –The national-security tone in the tweet is not crazy: if these numbers hold up in practice, this is the kind of model that compresses the gap between closed frontier systems and open weights.
DISCOVERED
5h ago
2026-04-29
PUBLISHED
5d ago
2026-04-24
RELEVANCE
AUTHOR
lu_sichu