REDDIT · REDDIT// 4h agoMODEL RELEASE

DeepSeek V4 drops 1.6T parameter MoE

DeepSeek releases V4-Pro and V4-Flash models featuring up to 1.6 trillion parameters and a 1M context window. The open-weights release introduces Sparse Attention for 90% memory reduction and undercuts frontier pricing by 7x.

// ANALYSIS

DeepSeek V4 is a massive escalation in the open-weights arms race, proving that 1T+ parameter models are now the standard for frontier performance.

–V4-Pro (1.6T) and V4-Flash (284B) directly challenge GPT-5 and Claude 4 with state-of-the-art agentic capabilities.
–DeepSeek Sparse Attention (DSA) and KV-cache compression enable massive memory savings, making 1T+ models viable for broader inference.
–Pricing at $3.48/M output tokens is roughly 1/7th of competitor costs, likely triggering a price war among major providers.
–Training on Huawei Ascend 950 hardware marks a significant shift toward domestic Chinese hardware for frontier AI.
–SOTA agentic performance on SWE-bench suggests a primary focus on deep coding and complex reasoning tasks.

// TAGS

deepseekllmopen-sourcereasoningai-codinginference

DISCOVERED

4h ago

2026-04-25

PUBLISHED

6h ago

2026-04-25

RELEVANCE

10/ 10

AUTHOR

crowtain