DeepSeek releases DeepEP V2, TileKernels

// 90d agoOPENSOURCE RELEASE

DeepSeek releases DeepEP V2, TileKernels

DeepSeek released DeepEP V2, a refactored expert-parallel communication stack for MoE training, alongside TileKernels, an MIT-licensed TileLang GPU kernel library for LLM operations. The update targets high-end NVIDIA SM90/SM100 systems with faster EP, lower SM usage, NCCL Gin, MoE routing, quantization, Engram, and mHC kernels.

// ANALYSIS

This is infrastructure news masquerading as a repo drop: DeepSeek is exposing more of the low-level machinery behind efficient frontier-scale MoE systems.

–DeepEP V2 claims up to 1.3x peak performance versus V1 while cutting SM usage by up to 4x, which matters when communication and compute need to overlap cleanly.
–Switching from NVSHMEM to a lighter NCCL Gin backend should make integration less exotic for teams already built around NCCL communicators.
–TileKernels is early and lightly documented, but its MoE, FP8/FP4 quantization, Engram, and mHC kernels offer a useful glimpse into DeepSeek’s internal training and inference priorities.
–The SM100/Blackwell target is notable because it signals where serious MoE optimization work is moving, even if most developers cannot run this stack locally yet.
–For AI infra teams, this is less a plug-and-play release than a benchmarkable reference point for how DeepSeek thinks about model-system co-design.

// TAGS

deepseekdeepeptilekernelsopen-sourcegpuinferencellmmlops

DISCOVERED

90d ago

2026-04-23

PUBLISHED

90d ago

2026-04-23

RELEVANCE

9/ 10

AUTHOR

External_Mood4719

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE2h ago

Open Science v0.6.0 adds remote compute, automation

Open Science has released version 0.6.0, transitioning from a desktop application into a connected AI research platform. This update introduces remote compute capabilities, programmable automation features, and improved interoperability with existing scientific tools to streamline research workflows.

INFRA3h ago

OpenRouter Powers Model Routing with Not Diamond

OpenRouter uses Not Diamond's routing engine under the hood to dynamically evaluate prompts and select optimal LLMs based on quality, cost, and latency requirements. As an intelligent meta-model recommender, Not Diamond enables developers to maintain high-quality outputs while optimizing inference efficiency.

OPEN SOURCE3h ago

Matt Pocock skills repo hits 10,000 stars

mattpocock/skills is an open-source GitHub repository created by Matt Pocock that translates core software engineering best practices—such as test-driven development, debugging, requirements analysis, and code reviews—into reusable AI agent skills. Built for modern AI coding environments like Cursor, Windsurf, and Claude Code, the project experienced explosive community adoption, gaining 10,651 new GitHub stars within a single week.