DeepInfra cuts NVIDIA Nemotron 3 Ultra prices

// 4d agoPRODUCT UPDATE

DeepInfra cuts NVIDIA Nemotron 3 Ultra prices

DeepInfra has reduced output and cached token prices for the NVIDIA Nemotron 3 Ultra Mixture-of-Experts model. Output prices are now cut to $2.20 per million tokens and cached reads are cut to $0.10 per million tokens.

// ANALYSIS

DeepInfra's price cut intensifies the API pricing war, making large-scale agentic reasoning on open-weights Mixture-of-Experts models significantly more cost-effective.

* By offering the 550B/55B MoE model at these rates, DeepInfra challenges proprietary frontier APIs on raw cost-to-performance metrics.

* The 33% discount on cached tokens is a direct play for developer workflows requiring long-context agentic reasoning and deep research.

* High context limits (256K) combined with low caching costs make multi-turn agent interactions highly viable at scale.

// TAGS

deepinfranvidianvidia-nemotron-3-ultramoeprice-cutai-inferencellmagentic-reasoning

DISCOVERED

4d ago

2026-06-16

PUBLISHED

4d ago

2026-06-16

RELEVANCE

6/ 10

AUTHOR

DeepInfra

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS45m ago

Google, Meta models land on Huawei Ascend

The Chinese AI ecosystem is focusing on porting Western open-source models, such as Google's T5-Efficient-Tiny and Meta's V-JEPA 2, to Huawei's Ascend NPU. This trend highlights a shift toward building out software support and compatibility for domestic silicon during a quiet cycle for novel local releases.

NEWS2h ago

OpenAI Codex teases major front-end updates

An upcoming update for OpenAI Codex is being teased on social media as a potentially game-changing solution for front-end development. The teaser hints that the new release will address long-standing challenges in automating front-end coding, generating excitement within the developer community about the next generation of AI-assisted software engineering tools.

NEWS3h ago

Codex App built with okayish frontend models

In a social media post, Thomas Sottiaux, head of the Codex team at OpenAI, revealed that the Codex desktop application was developed using models with only 'okayish' frontend capabilities. He teased the massive potential of what the team will be able to build once OpenAI's models receive significant upgrades to their frontend development skills.