Jason Von

46%
Flag icon
This answer might minimize the loss function in the training data because the moon being made out of cheese is a centuries-old trope. But this is still misinformation, however harmless in this instance. So LLMs undergo another stage in their training: what’s called RLHF, or reinforcement learning from human feedback. Basically, it works like this: the AI labs hire cheap labor—often from Amazon’s Mechanical Turk, where you can employ human AI trainers from any of roughly fifty countries—to score the model’s answers in the form of an A/B test: A: The Moon is made out of cheese. B: The Moon is ...more
On the Edge: The Art of Risking Everything
Rate this book
Clear rating
Open Preview