OBLITERATUS sharpens MoE-aware refusal removal
OBLITERATUS is a Python and Gradio-based toolkit for analysis-informed “abliteration” of LLM refusal behavior. The project presents itself as a one-click workflow for probing hidden states, extracting refusal directions, applying projection or steering interventions, benchmarking the modified model, and then saving or chatting with the result. It also adds research-oriented features like telemetry-backed aggregation, architecture detection, and explicit MoE support, which makes it feel more like a lab instrument than a simple utility script.
Hot take: this looks promising if you want interpretable refusal removal on tricky architectures, but Heretic still reads as the safer default for fully automated runs.
- –OBLITERATUS’ main edge is its MoE-aware framing: it advertises `surgical` mode, active-parameter GPU sizing, and per-expert analysis, so it is better tailored to MoE targets on paper.
- –Heretic looks more mature as a push-button optimizer, with automatic parameter search and explicit support for dense models plus several MoE architectures, so it is probably the lower-risk choice if you want something proven.
- –The Reddit thread itself is thin on real-world validation, so the “used on a real model” question is still mostly anecdotal rather than settled.
- –If you care about analysis tools, visualization, and community dataset building, OBLITERATUS is the more ambitious project; if you care about getting a decensored model quickly, Heretic is likely the simpler bet.
DISCOVERED
1d ago
2026-05-02
PUBLISHED
1d ago
2026-05-02
RELEVANCE
AUTHOR
AutomaticDriver5882