BACK_TO_FEEDAICRIER_2
NVIDIA Nemotron 3 Super: open-weight 120B MoE, 1M context
OPEN_SOURCE ↗
REDDIT · REDDIT// 29d agoMODEL RELEASE

NVIDIA Nemotron 3 Super: open-weight 120B MoE, 1M context

NVIDIA has released Nemotron Super, a 120B open-weight hybrid Mamba-Transformer MoE model activating only 12B parameters at inference, with a 1-million-token context window built for agentic workflows. It ships with full open weights, 25T-token pretraining data, and training recipes alongside same-day integrations across AWS, Azure, Google Cloud, and major inference providers.

// ANALYSIS

NVIDIA is playing the long game in open-weights AI: not just releasing a model, but the full stack — data, recipes, RL environments — making Nemotron Super a platform, not just a checkpoint.

  • The Mamba-Transformer hybrid architecture is genuinely novel at this scale: linear-time Mamba layers handle long context cheaply while Transformer attention handles precise recall, sidestepping the memory wall that kills dense models at 1M tokens
  • 12B active parameters from a 120B pool means inference cost is closer to a 12B model — competitive with Llama-class efficiency while vastly outperforming it on context length
  • Multi-Token Prediction delivering 3x wall-clock speedups for structured generation is huge for agentic use cases where output volume (tool calls, code) dominates latency
  • Same-day enterprise adoption from Perplexity, CodeRabbit, Palantir, and Cloudflare Workers AI signals this isn't a research drop — it's production-ready
  • NVFP4 native pretraining is a subtle but strategic move: it locks in Blackwell GPU advantages and widens the perf gap for anyone running on NVIDIA hardware
// TAGS
nemotron-3-supernvidiallmopen-weightsagentinferencemcpreasoningopen-source

DISCOVERED

29d ago

2026-03-14

PUBLISHED

31d ago

2026-03-12

RELEVANCE

9/ 10

AUTHOR

No-Swing2206