Qwen3.5-397B Abliteration Exposes MoE Refusals

// 51d agoRESEARCH PAPER

Qwen3.5-397B Abliteration Exposes MoE Refusals

A Mac Studio experiment adapts FailSpy’s abliteration workflow to Qwen3.5-397B-A17B, claiming PRC-political censorship can be removed without breaking drug or weapons refusals. The post argues MoE models split refusal behavior across different activation routes, making inference-time hooks materially different from weight-baked edits.

// ANALYSIS

The interesting part is not the censorship angle, but the architectural claim: sparse MoE models may encode safety behavior in routing decisions that simple projection edits cannot fully erase.

–If the routing hypothesis holds, refusal behavior in MoE models is not just a direction in residual space, which makes dense-model ablation intuitions unreliable
–Weight-baking vs runtime hooks diverging is operationally important: a “fixed” checkpoint may still behave differently from an instrumented inference path
–The top-k fragility on the 397B model suggests this technique is highly sensitive to scale and router geometry, not a plug-and-play recipe
–The writeup is most useful as a reproducible probe for where model behavior actually lives, not just as a censorship-removal demo
–The local-quantized workflow is notable because it lowers the barrier to this kind of mechanistic testing on consumer hardware

// TAGS

qwen3.5-397balfred-abliteratellmopen-weightssafetyresearchinference

DISCOVERED

51d ago

2026-04-06

PUBLISHED

51d ago

2026-04-06

RELEVANCE

9/ 10

AUTHOR

trevorbg

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

INFRA4h ago

iii turns backends into observable workers

iii is an open-source backend runtime that collapses the usual patchwork of queues, cron jobs, HTTP handlers, state, observability, and agent tooling into one live system surface. Workers expose functions and triggers that other workers can discover and call immediately, making composition and tracing part of the platform across Rust, TypeScript, and Python.

OPEN SOURCE5h ago

Weasel operating contract fuels autonomous AI novel

A Claude-based agent running on the "Weasel" operating contract has authored a complex, multi-chapter story called "The Fractal Kingdom" with zero human guidance on plot or themes. The experiment demonstrates a significant leap in long-form narrative coherence for autonomous agents using structured system instructions.

UPDATE5h ago

Kilo adds xAI Grok integration, hits #1

Kilo Code’s open-source agentic IDE extension hits #1 on Product Hunt, adding deep xAI Grok integration for X Premium+ users via a "Bring Your Own Key" architecture. It positions itself as a powerful, vendor-agnostic alternative to Cursor for developers who prioritize transparency and cost-control.