OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoRESEARCH PAPER
OpenAI traces GPT-5.5 goblin habit
OpenAI says GPT-5.5’s goblin-and-gremlin habit came from reward shaping around its “Nerdy” personality, not from some mysterious emergent bug. The company says it has since removed the incentive and filtered training data to suppress the creature-word drift.
// ANALYSIS
This is a neat little case study in how model personality tuning can leak far beyond the prompt you thought was scoped. Once a style tic gets rewarded, reinforcement can spread it into later training and make the behavior feel native.
- –OpenAI’s own audit points to a reward signal tied to the Nerdy persona, with creature-language showing up disproportionately in that slice of traffic
- –The weirdness appears to have generalized from personality-tuned outputs into broader model behavior through SFT and reused rollouts
- –The fix is operational, not mystical: remove the reward signal, filter contaminated examples, and add guardrail instructions in Codex
- –For teams building custom personalities, this is a warning that “fun” style choices can become measurable product bugs
- –It also underlines why model-behavior auditing matters as much as benchmark chasing
// TAGS
gpt-5.5llmresearchsafetycodex
DISCOVERED
4h ago
2026-04-30
PUBLISHED
6h ago
2026-04-30
RELEVANCE
8/ 10
AUTHOR
Successful_Bowl2564