Abliterlitics punctures HauhauCS lossless claim

// 90d agoBENCHMARK RESULT

Abliterlitics punctures HauhauCS lossless claim

Abliterlitics is a forensic benchmark suite comparing HauhauCS, Heretic, and Huihui abliteration across Qwen 3 and 3.5 models. The takeaway is blunt: HauhauCS can remove refusals very effectively, but its “lossless” framing does not survive benchmark, KL, and weight-analysis scrutiny, especially as model size grows.

// ANALYSIS

This is less a product launch than a stress test of a niche model-editing claim, and it lands hard on the “lossless” marketing. The work is strongest where it stays reproducible and comparative: same base models, same safety suite, same KL methodology, same weight analysis.

–The headline result is that HauhauCS is effective at uncensoring, but not lossless; TruthfulQA and other capability scores degrade more as models scale up.
–Heretic looks like the most consistent method overall, while Huihui is highly architecture-dependent, ranging from competitive on small models to broken on Qwen3.5-4B and weak on the 27B.
–The hybrid Mamba2+Transformer Qwen3.5 family behaves differently from the pure Transformer Qwen3-4B, which is a useful reminder that abliteration methods do not transfer cleanly across architectures.
–The 27B result is the most important practical signal: even a heavily safety-aligned base model can still be stripped of refusals, but broad tensor edits appear to impose the largest collateral damage.
–The strongest value not the uncensoring itself, but the methodology and artifact trail: benchmark cards, KL numbers, tensor overlap, and per-layer breakdowns make the claims checkable.

// TAGS

abliterliticsbenchmarksafetyresearchllmqwen

DISCOVERED

90d ago

2026-04-18

PUBLISHED

90d ago

2026-04-18

RELEVANCE

9/ 10

AUTHOR

nathandreamfast

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE20m ago

HivisionIDPhotos generates standard ID photos via CPU inference

HivisionIDPhotos is an open-source, lightweight AI tool designed to generate standard ID photos from casual portraits. It stands out for its fast, offline inference capabilities using only a CPU, making it accessible without requiring high-end GPU resources while supporting lightweight matting, face rotation, and multiple print layouts.

UPDATE26m ago

1Password launches zero-exposure Agentic Mode for Claude

1Password has launched a zero-exposure login capability for Claude on macOS, referred to as Agentic Mode. After a user approves the request via biometrics, 1Password directly injects credentials so they are never exposed to the AI model itself.

RESEARCH30m ago

Paper optimizes harnesses using own execution traces

A recently highlighted research paper discusses the significant value found in building harnesses, which serve as an external control layer. The paper argues that maintaining optimal harness performance requires minimal effort and can be achieved effectively by leveraging data from the harness's own execution cycles.

Abliterlitics punctures HauhauCS lossless claim