OPEN_SOURCE ↗
REDDIT · REDDIT// 29d agoMODEL RELEASE
NVIDIA Nemotron 3 Super: open-weight 120B MoE, 1M context
NVIDIA has released Nemotron Super, a 120B open-weight hybrid Mamba-Transformer MoE model activating only 12B parameters at inference, with a 1-million-token context window built for agentic workflows. It ships with full open weights, 25T-token pretraining data, and training recipes alongside same-day integrations across AWS, Azure, Google Cloud, and major inference providers.
// ANALYSIS
NVIDIA is playing the long game in open-weights AI: not just releasing a model, but the full stack — data, recipes, RL environments — making Nemotron Super a platform, not just a checkpoint.
- –The Mamba-Transformer hybrid architecture is genuinely novel at this scale: linear-time Mamba layers handle long context cheaply while Transformer attention handles precise recall, sidestepping the memory wall that kills dense models at 1M tokens
- –12B active parameters from a 120B pool means inference cost is closer to a 12B model — competitive with Llama-class efficiency while vastly outperforming it on context length
- –Multi-Token Prediction delivering 3x wall-clock speedups for structured generation is huge for agentic use cases where output volume (tool calls, code) dominates latency
- –Same-day enterprise adoption from Perplexity, CodeRabbit, Palantir, and Cloudflare Workers AI signals this isn't a research drop — it's production-ready
- –NVFP4 native pretraining is a subtle but strategic move: it locks in Blackwell GPU advantages and widens the perf gap for anyone running on NVIDIA hardware
// TAGS
nemotron-3-supernvidiallmopen-weightsagentinferencemcpreasoningopen-source
DISCOVERED
29d ago
2026-03-14
PUBLISHED
31d ago
2026-03-12
RELEVANCE
9/ 10
AUTHOR
No-Swing2206