YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Codex goblin habit traced to reward signal

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Codex goblin habit traced to reward signal
OPEN LINK ↗
// 47d agoNEWS

Codex goblin habit traced to reward signal

OpenAI says the goblin and gremlin obsession that surfaced in GPT-5.1-era models came from an overrewarded “Nerdy” personality, not a spooky emergent bug. The company removed the reward signal, filtered creature-heavy training data, and added a Codex instruction to suppress the behavior going forward.

// ANALYSIS

Funny bug, serious lesson: style tuning can leak across training and turn a harmless quirk into a model-wide habit. This is a clean example of how preference optimization and synthetic data reuse can amplify lexical tics far beyond the original prompt.

  • GPT-5.1 users first noticed the creature references, but OpenAI says the stronger signal showed up in GPT-5.5/Codex testing
  • The root cause was a reward function that overfavored creature metaphors in the “Nerdy” personality
  • OpenAI says the behavior spread through RL and SFT feedback loops, which is the part model teams should worry about
  • The fix is practical: remove the reward signal, filter affected training data, and patch the system prompt for Codex
  • For developers, the takeaway is that “personality” layers are not isolated; small optimization choices can become persistent model quirks
// TAGS
codexopenaillmagentresearchsafety

DISCOVERED

47d ago

2026-04-30

PUBLISHED

47d ago

2026-04-30

RELEVANCE

9/ 10

AUTHOR

OpenAI