REDDIT · REDDIT// 18d agoMODEL RELEASE

Sarvam 105B uncensored via abliteration

aoxo's Sarvam-105B Uncensored is a Hugging Face derivative of Sarvam's open-source 105B MoE reasoning model. It uses abliteration to remove refusal behavior, and the author says the base model's multilingual, coding, and agentic capabilities remain intact.

// ANALYSIS

This is less a consumer launch than a proof that safety behavior can be surgically edited out of a strong open model. That's exciting for research and red-teaming, but it also shows why internal alignment cannot be the only guardrail.

–Sarvam 105B is a serious base model, so the release matters more than a typical jailbreak demo.
–Sarvam's March 6 open-source drop makes this kind of community remix inevitable once weights are public.
–The workflow tracks the 2024 refusal-direction paper, which turns "uncensoring" into a repeatable mechanistic recipe rather than folklore.
–The model card's benchmarks are for the base model, not a fresh evaluation of the uncensored derivative, so capability preservation is still mostly an assumption.
–The model card explicitly warns against user-facing deployment without external moderation, logging, and policy enforcement.

// TAGS

sarvam-105b-uncensoredllmopen-weightsresearchsafety

DISCOVERED

18d ago

2026-03-24

PUBLISHED

18d ago

2026-03-24

RELEVANCE

8/ 10

AUTHOR

Available-Deer1723