DeepGEMM adds Blackwell support, Mega MoE kernels

// 45d agoOPENSOURCE RELEASE

DeepGEMM adds Blackwell support, Mega MoE kernels

DeepSeek's high-performance library now supports NVIDIA Blackwell and introduces "Mega MoE" kernels to minimize communication overhead. The update includes FP4 quantization support, signaling a push toward ultra-low precision for next-generation models.

// ANALYSIS

DeepSeek is aggressively front-running the industry by building Blackwell-native infrastructure before the hardware is even widely available.

–Mega MoE fuses dispatch and combine operations into a single kernel, ruthlessly optimizing NVLink-heavy distributed training.
–FP4 support for weights and MQA logits suggests DeepSeek's next model will be massive enough to require 4-bit precision to fit on standard clusters.
–Early Blackwell (SM100) support includes fifth-gen Tensor Core optimizations, placing DeepSeek at the bleeding edge of NVIDIA's new architecture.
–The library requires PyTorch 2.9+, indicating a reliance on the very latest compiler features for JIT kernel execution.
–This release confirms that "Expert Parallelism" efficiency is the primary architectural bottleneck DeepSeek is currently solving.

// TAGS

deepgemmdeepseekblackwellopen-sourcegpuinferencellm

DISCOVERED

45d ago

2026-04-16

PUBLISHED

45d ago

2026-04-16

RELEVANCE

9/ 10

AUTHOR

External_Mood4719

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE24m ago

Executor Announces Self-Hosted Cloud Version

Rhys Sullivan has announced the imminent release of a self-hosted cloud version of Executor, a local-first, sandboxed execution runtime designed as an integration and control plane for AI agents. Sullivan shared that prior architectural efforts to keep Executor's core database-agnostic and implement pluggable database adapters—while initially challenging—are now paying dividends, facilitating the rollout of the new self-hosted cloud platform.

OPEN SOURCE43m ago

OpenClaw, NVIDIA Release AI Agent Security Dataset

Vincent Koc, Chief Architect of the OpenClaw Foundation, has announced a collaboration with NVIDIA to release the largest security dataset focused on AI agent skills. Built on the OpenClaw platform, this dataset provides a robust vulnerability audit benchmark to address supply chain risks in local-first AI ecosystems.

NEWS49m ago

Nous Research optimizes Hermes Agent for RTX Spark

Nous Research has collaborated with NVIDIA to run its open-source Hermes Agent on the newly announced RTX Spark superchip. The integration uses the new OpenShell security runtime to enable kernel-level safety boundaries directly on local hardware.