BACK_TO_FEEDAICRIER_2
DeepSeek releases DeepEP V2, TileKernels
OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoOPENSOURCE RELEASE

DeepSeek releases DeepEP V2, TileKernels

DeepSeek released DeepEP V2, a refactored expert-parallel communication stack for MoE training, alongside TileKernels, an MIT-licensed TileLang GPU kernel library for LLM operations. The update targets high-end NVIDIA SM90/SM100 systems with faster EP, lower SM usage, NCCL Gin, MoE routing, quantization, Engram, and mHC kernels.

// ANALYSIS

This is infrastructure news masquerading as a repo drop: DeepSeek is exposing more of the low-level machinery behind efficient frontier-scale MoE systems.

  • DeepEP V2 claims up to 1.3x peak performance versus V1 while cutting SM usage by up to 4x, which matters when communication and compute need to overlap cleanly.
  • Switching from NVSHMEM to a lighter NCCL Gin backend should make integration less exotic for teams already built around NCCL communicators.
  • TileKernels is early and lightly documented, but its MoE, FP8/FP4 quantization, Engram, and mHC kernels offer a useful glimpse into DeepSeek’s internal training and inference priorities.
  • The SM100/Blackwell target is notable because it signals where serious MoE optimization work is moving, even if most developers cannot run this stack locally yet.
  • For AI infra teams, this is less a plug-and-play release than a benchmarkable reference point for how DeepSeek thinks about model-system co-design.
// TAGS
deepseekdeepeptilekernelsopen-sourcegpuinferencellmmlops

DISCOVERED

4h ago

2026-04-23

PUBLISHED

6h ago

2026-04-23

RELEVANCE

9/ 10

AUTHOR

External_Mood4719