Nemotron-3-Nano-4B abliteration removes GenRM censorship
Developer HauhauCS has released the first "aggressive" abliteration of NVIDIA's Nemotron-3-Nano-4B, specifically targeting the Generative Reward Model (GenRM) that acts as a secondary layer of real-time generation censorship. The release includes custom "K_P" quants that use model-specific analysis to provide 1-2 levels of higher quality than standard GGUF quants at a minimal file size increase, achieving a 0/465 refusal score on safety benchmarks.
This release marks a significant escalation in the "cat-and-mouse" game of model alignment by identifying and neutralizing internal reward-driven self-censorship. GenRM removal prevents the "CoT-to-refusal" pivot where a model reasons correctly but switches to a canned refusal in the final output. The hybrid Mamba2-Transformer architecture offers high performance with a massive 262K native context window in a compact 4B parameter size. Custom K_P quants represent a meaningful optimization for local LLM users, squeezing Q6 performance into near-Q4 file sizes. This is the first public demonstration that reward model layers can be systematically ablated without degrading the base model's reasoning capabilities. Native tool-calling support remains intact, making it a viable uncensored candidate for autonomous local agents.
DISCOVERED
18d ago
2026-03-25
PUBLISHED
18d ago
2026-03-25
RELEVANCE
AUTHOR
hauhau901