YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

OBLITERATUS sharpens MoE-aware refusal removal

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

OBLITERATUS sharpens MoE-aware refusal removal
OPEN LINK ↗
// 50d agoOPENSOURCE RELEASE

OBLITERATUS sharpens MoE-aware refusal removal

OBLITERATUS is a Python and Gradio-based toolkit for analysis-informed “abliteration” of LLM refusal behavior. The project presents itself as a one-click workflow for probing hidden states, extracting refusal directions, applying projection or steering interventions, benchmarking the modified model, and then saving or chatting with the result. It also adds research-oriented features like telemetry-backed aggregation, architecture detection, and explicit MoE support, which makes it feel more like a lab instrument than a simple utility script.

// ANALYSIS

Hot take: this looks promising if you want interpretable refusal removal on tricky architectures, but Heretic still reads as the safer default for fully automated runs.

  • OBLITERATUS’ main edge is its MoE-aware framing: it advertises `surgical` mode, active-parameter GPU sizing, and per-expert analysis, so it is better tailored to MoE targets on paper.
  • Heretic looks more mature as a push-button optimizer, with automatic parameter search and explicit support for dense models plus several MoE architectures, so it is probably the lower-risk choice if you want something proven.
  • The Reddit thread itself is thin on real-world validation, so the “used on a real model” question is still mostly anecdotal rather than settled.
  • If you care about analysis tools, visualization, and community dataset building, OBLITERATUS is the more ambitious project; if you care about getting a decensored model quickly, Heretic is likely the simpler bet.
// TAGS
llmabliterationmechanistic-interpretabilitymoeopen-sourcehuggingfacegradio

DISCOVERED

50d ago

2026-05-02

PUBLISHED

50d ago

2026-05-02

RELEVANCE

8/ 10

AUTHOR

AutomaticDriver5882