OPEN_SOURCE ↗
REDDIT · REDDIT// 2h agoMODEL RELEASE
Qwen3.6 abliterated variant lands on HF
Wangzhang published an abliterated Qwen3.6-35B-A3B checkpoint on Hugging Face, tuned around MoE-specific refusal behavior rather than dense-model attention paths. The repo claims 7/100 refusals under a stricter LLM-judge eval, with low KL drift from the base model.
// ANALYSIS
This is a useful reminder that “uncensoring” MoE models is not just a bigger LoRA problem; refusal behavior can live in expert routing and expert-specific projections.
- –The method targets O-proj, MLP down-proj, and expert slices while explicitly disabling Q/K/V, which matches the claim that the safety signal is routed through MoE experts.
- –The 7/100 refusal result is more credible than many flashy abliterated model cards because it uses longer generations and a judge model instead of simple keyword checks.
- –Router biasing toward selected “safety experts” is a strong intervention, but it also raises the risk of brittle behavior outside the exact eval set.
- –Low KL divergence suggests the base model’s general behavior is being preserved reasonably well, which matters more than raw uncensoring if the model still needs to be usable.
- –This sits in the same broader trend as other Qwen abliterations: community demand is clearly for local, less-filtered variants, especially on MoE checkpoints where the intervention surface is different.
// TAGS
llmopen-sourceqwen3.6-35b-a3b-abliterated
DISCOVERED
2h ago
2026-04-17
PUBLISHED
2h ago
2026-04-17
RELEVANCE
9/ 10
AUTHOR
Free_Change5638