DeepInfra cuts NVIDIA Nemotron 3 Ultra prices
DeepInfra has reduced output and cached token prices for the NVIDIA Nemotron 3 Ultra Mixture-of-Experts model. Output prices are now cut to $2.20 per million tokens and cached reads are cut to $0.10 per million tokens.
DeepInfra's price cut intensifies the API pricing war, making large-scale agentic reasoning on open-weights Mixture-of-Experts models significantly more cost-effective.
* By offering the 550B/55B MoE model at these rates, DeepInfra challenges proprietary frontier APIs on raw cost-to-performance metrics.
* The 33% discount on cached tokens is a direct play for developer workflows requiring long-context agentic reasoning and deep research.
* High context limits (256K) combined with low caching costs make multi-turn agent interactions highly viable at scale.
DISCOVERED
4d ago
2026-06-16
PUBLISHED
4d ago
2026-06-16
RELEVANCE
AUTHOR
DeepInfra