OPEN_SOURCE ↗
REDDIT · REDDIT// 29d agoNEWS
Nemotron 3 Super spurs speed-vs-vision debate
Days after NVIDIA released Nemotron Super 120B — a text-only, 1M-context model blazing at ~478 tokens/sec on Blackwell hardware — r/LocalLLaMA users are weighing it against Qwen3.5 122B, which trades raw speed and context length for native vision support.
// ANALYSIS
The speed-vs-vision split exposes a real gap in the open-weight landscape: no single 120B-class model currently offers both native multimodal capability and a genuine 1M-token context window.
- –Nemotron Super 120B's ~478 tokens/sec throughput on Blackwell hardware is exceptional for a 120B-class model, but NVFP4 quantization ties it tightly to NVIDIA's latest GPU lineup
- –Qwen3.5 122B's native vision-language support is a genuine differentiator for agentic workflows where image/video input matters
- –Nemotron's 1M context is native; Qwen3.5's 1M requires YaRN scaling from a 262K base — practically different in reliability and performance degradation at extreme lengths
- –Community is asking whether vision adapters can be bolted onto Nemotron Super — an open research question NVIDIA hasn't addressed
- –The "best of both" model doesn't exist yet, which is what's driving the debate
// TAGS
llmopen-weightsinferencereasoningnemotron-3-superqwen3.5
DISCOVERED
29d ago
2026-03-14
PUBLISHED
31d ago
2026-03-12
RELEVANCE
5/ 10
AUTHOR
Porespellar