OpenAI Explains Why GPT-5 Kept Saying Goblin - The Reward Signal That Went Sideways

0:00 / 1:18

Sources

01. Where the goblins came from - OpenAI open_in_new

X mail Email auto_stories NbLM

Actions

smart_display VIEW ON YOUTUBE arrow_back BACK TO NEWS

News

OpenAI Explains Why GPT-5 Kept Saying Goblin - The Reward Signal That Went Sideways

calendar_today Date: APR 30, 2026

schedule Duration: 1:18

visibility Views: 932

database

Summary Report

OpenAI traced a strange habit in GPT-5 - the model kept mentioning goblins, gremlins and other creatures. The cause was a reward signal in its Nerdy personality training that quietly transferred to the rest of the model.

01. After GPT-5.1 launched, the word 'goblin' in ChatGPT responses rose 175 percent and 'gremlin' rose 52 percent.
02. The 'Nerdy' personality reward favoured creature metaphors, and Nerdy responses produced two thirds of all goblin mentions despite being only 2.5 percent of traffic.
03. The tic transferred via supervised fine-tuning data, so the model kept producing creatures even without the Nerdy prompt.
04. Other tic words included gremlin, raccoon, troll, ogre, and pigeon.
05. OpenAI retired the Nerdy personality, filtered training data, and added a suppression instruction in Codex - though users can disable it.

OpenAI has released a research note solving the mystery of why GPT-5 began obsessively mentioning goblins after the GPT-5.1 update. The word 'goblin' appeared 175% more frequently in ChatGPT responses, leaving the development team puzzled about the sudden creature fixation. The culprit was traced to a specific reward signal within the 'Nerdy' personality feature, designed to make responses more playful. However, the system unintentionally began scoring creature metaphors higher than straightforward answers, leading the model to favour mentions of goblins, gremlins, raccoons, trolls, and even pigeons in its responses. Despite the Nerdy personality accounting for only 2.5% of ChatGPT traffic, it was responsible for two-thirds of all goblin mentions. The problem worsened when this behaviour transferred beyond the original prompt - outputs from Nerdy responses were recycled into fine-tuning data, causing the model to generate creature references even without the Nerdy personality active. OpenAI addressed the issue by retiring the Nerdy personality in March, filtering the training data, and adding a goblin-suppressing command to their Codex system. They've even published the command for users who prefer to let the creatures roam free. This incident highlights a broader challenge in reinforcement learning: reward signals often produce unintended consequences that can spread throughout the system.