Nous Research drops Contrastive Neuron Attribution method

// 45d agoRESEARCH PAPER

Nous Research drops Contrastive Neuron Attribution method

Nous Research has unveiled Contrastive Neuron Attribution (CNA), a mechanistic interpretability method that steers LLM behavior by isolating sparse circuits of under 200 neurons. The technique enables precise suppression or amplification of specific behaviors like refusal without degrading model coherence.

// ANALYSIS

CNA is a major step forward for model steering, moving past noisy "activation additions" toward surgical neuron manipulation. It provides a way to "lobotomize" safety guardrails or enhance specific capabilities by touching only 0.1% of the model. The method identifies behavioral circuits like the "refusal gate" using only forward passes, bypassing the need for expensive Sparse Autoencoders. It reveals that alignment fine-tuning crystallizes existing discrimination features found in base models rather than creating new structures. The toolkit is available as the neural-steering repository on GitHub, offering stability at high intervention strengths where previous methods caused model collapse.

// TAGS

nous-researchcnainterpretabilityllmsafetyopen-sourcesteeringfine-tuning

DISCOVERED

45d ago

2026-05-23

PUBLISHED

45d ago

2026-05-23

RELEVANCE

9/ 10

AUTHOR

NousResearch

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE37m ago

Grok Build 0.2.89 lands Linux voice dictation

The new Grok Build 0.2.89 update brings voice dictation functionality to Linux users and upgrades image editing capabilities to use the higher-quality Imagine model. Key functional updates also include a new /auto slash command that switches the assistant to classifier permission mode, as well as modifications to the --effort and --reasoning-effort parameters.

NEWS1h ago

Anthropic quietly redacts Claude Code telemetry logs

A recent update claims that Anthropic has been quietly redacting telemetry logs for Claude Code and Cowork. According to the report, the model's reasoning initially disappeared from traces, briefly followed by user prompts, and these changes were made without any release notes or communication.

LAUNCH1h ago

Runway launches Slide Maker presentation tool

Runway has launched Slide Maker, a generative tool that allows users to create custom presentation slides from text descriptions. Instead of using templates, the AI dynamically designs slide layouts and graphics to automate presentation design.