OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoOPENSOURCE RELEASE
DeepSeek releases DeepEP V2, TileKernels
DeepSeek released DeepEP V2, a refactored expert-parallel communication stack for MoE training, alongside TileKernels, an MIT-licensed TileLang GPU kernel library for LLM operations. The update targets high-end NVIDIA SM90/SM100 systems with faster EP, lower SM usage, NCCL Gin, MoE routing, quantization, Engram, and mHC kernels.
// ANALYSIS
This is infrastructure news masquerading as a repo drop: DeepSeek is exposing more of the low-level machinery behind efficient frontier-scale MoE systems.
- –DeepEP V2 claims up to 1.3x peak performance versus V1 while cutting SM usage by up to 4x, which matters when communication and compute need to overlap cleanly.
- –Switching from NVSHMEM to a lighter NCCL Gin backend should make integration less exotic for teams already built around NCCL communicators.
- –TileKernels is early and lightly documented, but its MoE, FP8/FP4 quantization, Engram, and mHC kernels offer a useful glimpse into DeepSeek’s internal training and inference priorities.
- –The SM100/Blackwell target is notable because it signals where serious MoE optimization work is moving, even if most developers cannot run this stack locally yet.
- –For AI infra teams, this is less a plug-and-play release than a benchmarkable reference point for how DeepSeek thinks about model-system co-design.
// TAGS
deepseekdeepeptilekernelsopen-sourcegpuinferencellmmlops
DISCOVERED
4h ago
2026-04-23
PUBLISHED
6h ago
2026-04-23
RELEVANCE
9/ 10
AUTHOR
External_Mood4719