DeepSeek-V4 hits Hugging Face with 1.6T MoE, 1M context
DeepSeek-AI has launched its V4 model family, featuring a 1.6 trillion parameter Pro model and a 284 billion parameter Flash model. Both models introduce "Hybrid Attention" and standardized 1-million-token context windows for open-weight intelligence.
DeepSeek-V4 is a direct challenge to the top-tier closed-source models, doubling down on the "efficient MoE" architecture that made V3 a developer favorite.
- –1M context window becomes the new baseline for foundation models, supported by novel compressed attention architectures that reduce memory overhead.
- –V4-Pro (1.6T) targets elite-level coding and reasoning performance, reportedly rivaling Claude 4 and GPT-5 class models in technical benchmarks.
- –V4-Flash (284B total, 13B active) is a massive efficiency play, likely to dominate the high-throughput, long-context agentic market.
- –Engram Conditional Memory and Manifold-Constrained Hyper-Connections (mHC) signal a shift from simple scaling to deep architectural refinement for signal stability.
- –MIT licensing and aggressive pricing continue to erode the competitive moat of closed-source API ecosystems.
DISCOVERED
45d ago
2026-04-24
PUBLISHED
45d ago
2026-04-24
RELEVANCE
AUTHOR
MichaelXie4645