OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoMODEL RELEASE
Max-and-Omnis releases GGUF builds of REAP-pruned Nemotron-3-Super
Max-and-Omnis released GGUF files for its REAP-pruned Nemotron-3-Super math model, following an earlier post about the pruning and GRPO fine-tuning pipeline. The release is aimed at local inference users who want a smaller, more practical 64B-class model derived from NVIDIA’s Nemotron-3-Super-120B-A12B, with BF16, FP8, and AWQ variants already available in the broader release family.
// ANALYSIS
Strong niche release for LocalLLaMA users who care about squeezing frontier-ish reasoning quality into local hardware.
- –The headline is not the model itself so much as the packaging: GGUF makes it immediately relevant to local inference workflows.
- –The post frames this as a continuation of a more technical earlier release, so this looks like an iteration on an existing model line rather than a brand-new base model.
- –The release is credible for the audience because it includes multiple quantization targets and a concrete motivation: local deployment on high-end consumer/prosumer GPUs.
- –The strongest hook is the combination of math-tuned performance and practical runtime formats, which is exactly what the LocalLLaMA crowd tends to reward.
// TAGS
llmgguflocal-inferencequantizationnemotronmoemathopen-source
DISCOVERED
4h ago
2026-04-24
PUBLISHED
7h ago
2026-04-24
RELEVANCE
8/ 10
AUTHOR
max6296