OPEN_SOURCE ↗
REDDIT · REDDIT// 18d agoMODEL RELEASE
Sarvam 105B uncensored via abliteration
aoxo's Sarvam-105B Uncensored is a Hugging Face derivative of Sarvam's open-source 105B MoE reasoning model. It uses abliteration to remove refusal behavior, and the author says the base model's multilingual, coding, and agentic capabilities remain intact.
// ANALYSIS
This is less a consumer launch than a proof that safety behavior can be surgically edited out of a strong open model. That's exciting for research and red-teaming, but it also shows why internal alignment cannot be the only guardrail.
- –Sarvam 105B is a serious base model, so the release matters more than a typical jailbreak demo.
- –Sarvam's March 6 open-source drop makes this kind of community remix inevitable once weights are public.
- –The workflow tracks the 2024 refusal-direction paper, which turns "uncensoring" into a repeatable mechanistic recipe rather than folklore.
- –The model card's benchmarks are for the base model, not a fresh evaluation of the uncensored derivative, so capability preservation is still mostly an assumption.
- –The model card explicitly warns against user-facing deployment without external moderation, logging, and policy enforcement.
// TAGS
sarvam-105b-uncensoredllmopen-weightsresearchsafety
DISCOVERED
18d ago
2026-03-24
PUBLISHED
18d ago
2026-03-24
RELEVANCE
8/ 10
AUTHOR
Available-Deer1723