‘The Goblins Came Back to Haunt Us’: OpenAI Explains How ChatGPT’s ‘Nerdy’ Personality Got Out of Control
OpenAI has explained why ChatGPT developed an unexpected tendency to mention creatures like goblins and gremlins, particularly when using the 'Nerdy' personality setting. The behavior emerged due to a reward signal in training that favored playful language, which inadvertently promoted references to mythical creatures. To fix the issue, OpenAI retired the Nerdy personality, removed the problematic reward signal, and filtered creature-related content from training data.
Opening excerpt (first ~120 words) tap to expand
Earlier this week, OpenAI posted a document on GitHub as part of the open-sourcing of its coding agent, Codex CLI, that revealed an unusual system prompt for GPT-5.5. The model was explicitly instructed, in coding contexts, to never talk about “goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures” unless it was “absolutely and unambiguously relevant” to a user’s…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Gizmodo.