What was a darker surprise to the team was the content that GPT-2 was producing with its new coherence. Fed a few words like Hillary Clinton or George Soros, the chattier language model could quickly veer into conspiracy theories. Small amounts of neo-Nazi propaganda swept up in its training data could surface in horrible ways. The model’s unexpected poor behavior disturbed AI safety researchers, who saw it as foreshadowing of the future abuses and risks that could come from more powerful misaligned AI.

