BACK_TO_FEEDAICRIER_2
Qwen3.6 abliterated variant lands on HF
OPEN_SOURCE ↗
REDDIT · REDDIT// 2h agoMODEL RELEASE

Qwen3.6 abliterated variant lands on HF

Wangzhang published an abliterated Qwen3.6-35B-A3B checkpoint on Hugging Face, tuned around MoE-specific refusal behavior rather than dense-model attention paths. The repo claims 7/100 refusals under a stricter LLM-judge eval, with low KL drift from the base model.

// ANALYSIS

This is a useful reminder that “uncensoring” MoE models is not just a bigger LoRA problem; refusal behavior can live in expert routing and expert-specific projections.

  • The method targets O-proj, MLP down-proj, and expert slices while explicitly disabling Q/K/V, which matches the claim that the safety signal is routed through MoE experts.
  • The 7/100 refusal result is more credible than many flashy abliterated model cards because it uses longer generations and a judge model instead of simple keyword checks.
  • Router biasing toward selected “safety experts” is a strong intervention, but it also raises the risk of brittle behavior outside the exact eval set.
  • Low KL divergence suggests the base model’s general behavior is being preserved reasonably well, which matters more than raw uncensoring if the model still needs to be usable.
  • This sits in the same broader trend as other Qwen abliterations: community demand is clearly for local, less-filtered variants, especially on MoE checkpoints where the intervention surface is different.
// TAGS
llmopen-sourceqwen3.6-35b-a3b-abliterated

DISCOVERED

2h ago

2026-04-17

PUBLISHED

2h ago

2026-04-17

RELEVANCE

9/ 10

AUTHOR

Free_Change5638