DeepInfra hosts NVIDIA Nemotron-3 Ultra

// 45d agoMODEL RELEASE

DeepInfra hosts NVIDIA Nemotron-3 Ultra

DeepInfra has announced serverless hosting support for NVIDIA's 550-billion-parameter Nemotron-3 Ultra Mixture-of-Experts model. The integration delivers inference speeds exceeding 300 tokens per second for complex reasoning and enterprise-grade agentic workflows.

// ANALYSIS

DeepInfra's rapid adoption of Nemotron-3 Ultra underscores the intense competition among AI inference hosts to serve frontier-level open-weights models at scale. By hosting this 550B parameter giant, DeepInfra democratizes access to enterprise-grade reasoning that would otherwise be cost-prohibitive and complex for individual developers to self-host.

* High-speed, high-density MoE inference showcases DeepInfra's engineering prowess, achieving over 300 tokens per second.

* The model's focus on long-running, autonomous agents signals a major industry shift from simple chat interfaces to complex, multi-step problem-solving systems.

* Hybrid architectures like Mamba-Transformer are proving to be highly competitive and resource-efficient against pure Transformer implementations at this scale.

// TAGS

deepinfranvidianemotron-3-ultramodel-releasellmmoeinferencecloudinfrastructureagentopen-weights

DISCOVERED

45d ago

2026-06-01

PUBLISHED

45d ago

2026-06-01

RELEVANCE

8/ 10

AUTHOR

DeepInfra

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE34m ago

OpenCode 1.18.3 ships command palette and WSL fixes

OpenCode v1.18.3 improves developer workspace navigation with a new Home-specific Cmd/Ctrl+K palette for quick command and session searching. The update also resolves critical startup issues for custom-agent configurations and WSL instances while optimizing subagent navigation in the TUI.

UPDATE34m ago

Kimi K3 lands on Vercel AI Gateway

Moonshot AI has launched its newest flagship model, Kimi K3, which boasts a massive 1-million-token context window and native vision capabilities. This model is now integrated and available directly through the Vercel AI Gateway, allowing developers to easily access its estimated 2 to 3 trillion parameter scale and agent swarm technology alongside other top-tier AI providers.

MODEL42m ago

Moonshot AI releases 2.8T Kimi K3

Moonshot AI has released Kimi K3, a 2.8-trillion parameter open-weight model featuring a 1 million token context window. The model is designed for advanced reasoning, complex coding tasks, and multi-agent system orchestration.