BACK_TO_FEEDAICRIER_2
OpenAI traces GPT-5.5 goblin habit
OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoRESEARCH PAPER

OpenAI traces GPT-5.5 goblin habit

OpenAI says GPT-5.5’s goblin-and-gremlin habit came from reward shaping around its “Nerdy” personality, not from some mysterious emergent bug. The company says it has since removed the incentive and filtered training data to suppress the creature-word drift.

// ANALYSIS

This is a neat little case study in how model personality tuning can leak far beyond the prompt you thought was scoped. Once a style tic gets rewarded, reinforcement can spread it into later training and make the behavior feel native.

  • OpenAI’s own audit points to a reward signal tied to the Nerdy persona, with creature-language showing up disproportionately in that slice of traffic
  • The weirdness appears to have generalized from personality-tuned outputs into broader model behavior through SFT and reused rollouts
  • The fix is operational, not mystical: remove the reward signal, filter contaminated examples, and add guardrail instructions in Codex
  • For teams building custom personalities, this is a warning that “fun” style choices can become measurable product bugs
  • It also underlines why model-behavior auditing matters as much as benchmark chasing
// TAGS
gpt-5.5llmresearchsafetycodex

DISCOVERED

4h ago

2026-04-30

PUBLISHED

6h ago

2026-04-30

RELEVANCE

8/ 10

AUTHOR

Successful_Bowl2564