Andreas Holmer

48%
Flag icon
A key driver behind this progress is called reinforcement learning from human feedback. To fix their bias-prone LLMs, researchers set up cunningly constructed multi-turn conversations with the model, prompting it to say obnoxious, harmful, or offensive things, seeing where and how it goes wrong. Flagging these missteps, researchers then reintegrate these human insights into the model, eventually teaching it a more desirable worldview, in a way not wholly dissimilar from how we try to teach children not to say inappropriate things at the dinner table.
Andreas Holmer
This seems problematic, prone to bias.
The Coming Wave: AI, Power, and Our Future
Rate this book
Clear rating
Open Preview