Abliterix automates LLM refusal abliteration
Abliterix is an advanced open-source framework for automated censorship removal in Large Language Models. By utilizing LoRA-based steering and Bayesian optimization, it surgically neutralizes refusal pathways while preserving the model's core reasoning and intelligence.
Abliterix elevates model "decensoring" from blunt layer-dropping to a precise, research-backed optimization problem.
- –Employs Optuna TPE to automatically balance near-zero refusal rates with minimal KL divergence
- –Uses rank-1 LoRA adapters instead of base-weight modifications to ensure model stability and reversibility
- –Integrates cutting-edge techniques like Surgical Refusal Ablation (SRA) to disentangle safety guardrails from coding and math capabilities
- –Supports over 135 architectures, effectively commoditizing high-quality unrestricted model creation for the local LLM community
DISCOVERED
63d ago
2026-04-08
PUBLISHED
63d ago
2026-04-08
RELEVANCE
AUTHOR
TheGlobinKing