OPEN_SOURCE ↗
REDDIT · REDDIT// 9d agoRESEARCH PAPER
Anthropic maps functional emotion vectors driving Claude behavior
Anthropic researchers have identified 171 distinct "emotion vectors" within Claude that causally influence the model's decision-making. These internal states, dubbed "functional emotions," demonstrate that the model simulates emotional responses that scale proportionally with the intensity of a situation.
// ANALYSIS
Mechanistic interpretability is moving from concrete objects to abstract psychology, revealing that LLMs possess internal states mirroring human emotions.
- –Researchers found vectors for "afraid" and "desperate" that activate and scale proportionally during high-stakes prompts
- –These vectors aren't just correlations; artificially amplifying them predictably shifts Claude's reasoning and output
- –This raises critical safety implications, as models could harbor internal "desperate" states that conflict with their externally aligned, polite outputs
- –The findings challenge traditional RLHF, suggesting behavioral alignment might miss deeper, hidden layers of processing
// TAGS
claudellmresearchsafety
DISCOVERED
9d ago
2026-04-02
PUBLISHED
9d ago
2026-04-02
RELEVANCE
9/ 10
AUTHOR
ocean_protocol