YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Qwen3.5-397B Abliteration Exposes MoE Refusals

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Qwen3.5-397B Abliteration Exposes MoE Refusals
OPEN LINK ↗
// 51d agoRESEARCH PAPER

Qwen3.5-397B Abliteration Exposes MoE Refusals

A Mac Studio experiment adapts FailSpy’s abliteration workflow to Qwen3.5-397B-A17B, claiming PRC-political censorship can be removed without breaking drug or weapons refusals. The post argues MoE models split refusal behavior across different activation routes, making inference-time hooks materially different from weight-baked edits.

// ANALYSIS

The interesting part is not the censorship angle, but the architectural claim: sparse MoE models may encode safety behavior in routing decisions that simple projection edits cannot fully erase.

  • If the routing hypothesis holds, refusal behavior in MoE models is not just a direction in residual space, which makes dense-model ablation intuitions unreliable
  • Weight-baking vs runtime hooks diverging is operationally important: a “fixed” checkpoint may still behave differently from an instrumented inference path
  • The top-k fragility on the 397B model suggests this technique is highly sensitive to scale and router geometry, not a plug-and-play recipe
  • The writeup is most useful as a reproducible probe for where model behavior actually lives, not just as a censorship-removal demo
  • The local-quantized workflow is notable because it lowers the barrier to this kind of mechanistic testing on consumer hardware
// TAGS
qwen3.5-397balfred-abliteratellmopen-weightssafetyresearchinference

DISCOVERED

51d ago

2026-04-06

PUBLISHED

51d ago

2026-04-06

RELEVANCE

9/ 10

AUTHOR

trevorbg