NVIDIA drops Nemotron-3 Ultra with 500B MoE
NVIDIA has announced the Nemotron-3 model family, headlined by the 500B-parameter "Ultra" variant. Utilizing a hybrid Mamba-2 and Transformer MoE architecture, it achieves 5x throughput gains for agentic workflows and features a 1-million-token context window, positioning it as a top-tier open-weight reasoning engine for the agentic AI era.
NVIDIA is no longer just providing the shovels; with Nemotron-3, they are delivering the gold standard for high-throughput, long-context reasoning. The hybrid Mamba-2/Transformer architecture solves the quadratic scaling issue of pure Transformers while maintaining high-precision reasoning capabilities. With 50B active parameters for a 500B model, it provides an efficiency ratio that challenges proprietary models like GPT-4o in cost-to-performance metrics. Native NVFP4 training support optimized for Blackwell hardware signals a tight vertical integration between NVIDIA's chips and its software stack. Multi-Token Prediction (MTP) and a 1M token context window make this model specifically engineered for autonomous agents. Benchmark leads over Kimi K2 and GLM-4-Plus in reasoning tasks (AIME 2025) suggest NVIDIA is now a primary contender in the foundational model race.
DISCOVERED
26d ago
2026-03-16
PUBLISHED
26d ago
2026-03-16
RELEVANCE
AUTHOR
reversedu