REDDIT · REDDIT// 4h agoMODEL RELEASE

Max-and-Omnis releases GGUF builds of REAP-pruned Nemotron-3-Super

Max-and-Omnis released GGUF files for its REAP-pruned Nemotron-3-Super math model, following an earlier post about the pruning and GRPO fine-tuning pipeline. The release is aimed at local inference users who want a smaller, more practical 64B-class model derived from NVIDIA’s Nemotron-3-Super-120B-A12B, with BF16, FP8, and AWQ variants already available in the broader release family.

// ANALYSIS

Strong niche release for LocalLLaMA users who care about squeezing frontier-ish reasoning quality into local hardware.

–The headline is not the model itself so much as the packaging: GGUF makes it immediately relevant to local inference workflows.
–The post frames this as a continuation of a more technical earlier release, so this looks like an iteration on an existing model line rather than a brand-new base model.
–The release is credible for the audience because it includes multiple quantization targets and a concrete motivation: local deployment on high-end consumer/prosumer GPUs.
–The strongest hook is the combination of math-tuned performance and practical runtime formats, which is exactly what the LocalLLaMA crowd tends to reward.

// TAGS

llmgguflocal-inferencequantizationnemotronmoemathopen-source

DISCOVERED

4h ago

2026-04-24

PUBLISHED

7h ago

2026-04-24

RELEVANCE

8/ 10

AUTHOR

max6296