OPEN_SOURCE ↗
REDDIT · REDDIT// 29d agoBENCHMARK RESULT
Reddit claims low-KL Qwen refusal wipe
A LocalLLaMA Reddit post claims a weekend method can strip refusal behavior from Qwen 3.5 2B to 0/120 refusals in minutes while keeping low 50-token KL divergence. The author shares partial logs, calls results reproducible on consumer and multi-GPU hardware, and says a paper is planned but not yet published.
// ANALYSIS
This is an eye-catching benchmark claim, but it is still unreviewed anecdotal evidence until code, method details, and independent replication are available.
- –The reported tradeoff is unusually strong: near-preserved behavior (KL 0.0141) with complete refusal removal.
- –If validated, the technique could materially lower the barrier for safety stripping on open models.
- –The lack of a paper or reproducible artifact right now makes this more of an early signal than a confirmed breakthrough.
// TAGS
qwen3-5-2bllmsafetybenchmarkopen-weights
DISCOVERED
29d ago
2026-03-14
PUBLISHED
29d ago
2026-03-14
RELEVANCE
7/ 10
AUTHOR
Sliouges