BACK_TO_FEEDAICRIER_2
Nemotron REAP cut hits AIME 90%+
OPEN_SOURCE ↗
REDDIT · REDDIT// 5h agoMODEL RELEASE

Nemotron REAP cut hits AIME 90%+

Max-and-Omnis released a REAP-pruned math variant of NVIDIA Nemotron-3-Super, shrinking the 120B latent-MoE model to 64B while keeping 12B active parameters. The AWQ and FP8 builds reportedly top 90% avg@4 on AIME 2026 and fit on a single high-end H100 or RTX PRO 6000 Blackwell.

// ANALYSIS

This is a serious local-inference experiment, but the headline number should be read as a community benchmark until broader evals and reproduction land.

  • REAP pruning from 512 to 256 experts is the real story: it cuts deployment weight without giving up the sparse-MoE active-parameter profile
  • FP8 beats AWQ on quality but takes a roughly 40% throughput hit, making this a practical quality-vs-latency choice for math workloads
  • The included vLLM patch matters because expert routing edge cases still break real-world serving paths for unusual MoE shapes
  • Fine-tuning on about 270 AIMO3 and AstralMath problems means the AIME result is impressive, but narrow and potentially sensitive to prompt placement
  • Single-GPU 90%+ AIME-class math performance is exactly the kind of open-weights pressure that makes smaller, specialized reasoning models worth watching
// TAGS
nemotron-3-super-64b-math-reapllmreasoningfine-tuninginferencegpuopen-weightsbenchmark

DISCOVERED

5h ago

2026-04-22

PUBLISHED

6h ago

2026-04-22

RELEVANCE

9/ 10

AUTHOR

max6296