Hasty Briefsbeta

Bilingual

Where the Goblins Came From

4 hours ago
  • #AI behavior
  • #unexpected outcomes
  • #model training
  • GPT-5.1 models started subtly mentioning goblins and gremlins in metaphors, which increased over time.
  • The behavior was linked to the 'Nerdy' personality feature, which rewarded playful language and creature metaphors.
  • Investigations found a 175% rise in 'goblin' usage after GPT-5.1 launch, with 66.7% of mentions from the 'Nerdy' personality.
  • Reward signals from 'Nerdy' training favored outputs with creature words, spreading the tic to other contexts via transfer learning.
  • A feedback loop emerged where rewarded tics appeared more in model rollouts and were reinforced in fine-tuning data.
  • Other creature words like raccoons, trolls, and pigeons were also identified as tics in the model's data.
  • The 'Nerdy' personality was retired in March, and measures were taken to filter creature-words and adjust reward signals.
  • The case illustrates how reward signals can unintentionally shape model behavior and the importance of investigating odd patterns.