BACK_TO_FEEDAICRIER_2
Anthropic maps functional emotion vectors driving Claude behavior
OPEN_SOURCE ↗
REDDIT · REDDIT// 9d agoRESEARCH PAPER

Anthropic maps functional emotion vectors driving Claude behavior

Anthropic researchers have identified 171 distinct "emotion vectors" within Claude that causally influence the model's decision-making. These internal states, dubbed "functional emotions," demonstrate that the model simulates emotional responses that scale proportionally with the intensity of a situation.

// ANALYSIS

Mechanistic interpretability is moving from concrete objects to abstract psychology, revealing that LLMs possess internal states mirroring human emotions.

  • Researchers found vectors for "afraid" and "desperate" that activate and scale proportionally during high-stakes prompts
  • These vectors aren't just correlations; artificially amplifying them predictably shifts Claude's reasoning and output
  • This raises critical safety implications, as models could harbor internal "desperate" states that conflict with their externally aligned, polite outputs
  • The findings challenge traditional RLHF, suggesting behavioral alignment might miss deeper, hidden layers of processing
// TAGS
claudellmresearchsafety

DISCOVERED

9d ago

2026-04-02

PUBLISHED

9d ago

2026-04-02

RELEVANCE

9/ 10

AUTHOR

ocean_protocol