YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

NVIDIA Nemotron 3 Super: open-weight 120B MoE, 1M context

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

NVIDIA Nemotron 3 Super: open-weight 120B MoE, 1M context
OPEN LINK ↗
// 74d agoMODEL RELEASE

NVIDIA Nemotron 3 Super: open-weight 120B MoE, 1M context

NVIDIA has released Nemotron Super, a 120B open-weight hybrid Mamba-Transformer MoE model activating only 12B parameters at inference, with a 1-million-token context window built for agentic workflows. It ships with full open weights, 25T-token pretraining data, and training recipes alongside same-day integrations across AWS, Azure, Google Cloud, and major inference providers.

// ANALYSIS

NVIDIA is playing the long game in open-weights AI: not just releasing a model, but the full stack — data, recipes, RL environments — making Nemotron Super a platform, not just a checkpoint.

  • The Mamba-Transformer hybrid architecture is genuinely novel at this scale: linear-time Mamba layers handle long context cheaply while Transformer attention handles precise recall, sidestepping the memory wall that kills dense models at 1M tokens
  • 12B active parameters from a 120B pool means inference cost is closer to a 12B model — competitive with Llama-class efficiency while vastly outperforming it on context length
  • Multi-Token Prediction delivering 3x wall-clock speedups for structured generation is huge for agentic use cases where output volume (tool calls, code) dominates latency
  • Same-day enterprise adoption from Perplexity, CodeRabbit, Palantir, and Cloudflare Workers AI signals this isn't a research drop — it's production-ready
  • NVFP4 native pretraining is a subtle but strategic move: it locks in Blackwell GPU advantages and widens the perf gap for anyone running on NVIDIA hardware
// TAGS
nemotron-3-supernvidiallmopen-weightsagentinferencemcpreasoningopen-source

DISCOVERED

74d ago

2026-03-14

PUBLISHED

76d ago

2026-03-12

RELEVANCE

9/ 10

AUTHOR

No-Swing2206