DeepSeek-V4 hits million-token context with MoE efficiency
DeepSeek-AI’s latest MoE release features V4-Pro (1.6T) and V4-Flash (284B) models supporting a 1M-token context length. The architecture uses Hybrid Attention to slash KV cache by 90% and inference FLOPs by 73% compared to V3.2, while setting new open-source records in coding and reasoning benchmarks.
DeepSeek-V4 is a masterclass in efficiency, proving that million-token context can be economically viable through architectural innovation rather than just brute-force compute. Its Hybrid Attention makes long-context inference 10x more memory-efficient than previous generations, while coding performance on LiveCodeBench rivals closed-source giants like Gemini-3.1-Pro. New reasoning modes allow developers to optimize for speed or depth, and the use of the Muon optimizer enables stable training at the 1.6T parameter scale.
DISCOVERED
5h ago
2026-04-24
PUBLISHED
6h ago
2026-04-24
RELEVANCE
AUTHOR
cmrdporcupine