YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

OpenAI traces GPT-5.5 goblin habit

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

OpenAI traces GPT-5.5 goblin habit
OPEN LINK ↗
// 45d agoRESEARCH PAPER

OpenAI traces GPT-5.5 goblin habit

OpenAI says GPT-5.5’s goblin-and-gremlin habit came from reward shaping around its “Nerdy” personality, not from some mysterious emergent bug. The company says it has since removed the incentive and filtered training data to suppress the creature-word drift.

// ANALYSIS

This is a neat little case study in how model personality tuning can leak far beyond the prompt you thought was scoped. Once a style tic gets rewarded, reinforcement can spread it into later training and make the behavior feel native.

  • OpenAI’s own audit points to a reward signal tied to the Nerdy persona, with creature-language showing up disproportionately in that slice of traffic
  • The weirdness appears to have generalized from personality-tuned outputs into broader model behavior through SFT and reused rollouts
  • The fix is operational, not mystical: remove the reward signal, filter contaminated examples, and add guardrail instructions in Codex
  • For teams building custom personalities, this is a warning that “fun” style choices can become measurable product bugs
  • It also underlines why model-behavior auditing matters as much as benchmark chasing
// TAGS
gpt-5.5llmresearchsafetycodex

DISCOVERED

45d ago

2026-04-30

PUBLISHED

45d ago

2026-04-30

RELEVANCE

8/ 10

AUTHOR

Successful_Bowl2564