AI companies hire workers, some highly paid experts, others low-paid contract workers in English-speaking nations like Kenya, to read AI answers and judge them on various characteristics. In some cases, that might be rating results for accuracy, in others it might be to screen out violent or pornographic answers. That feedback is then used to do additional training, fine-tuning the AI’s performance to fit the preferences of the human, providing additional learning that reinforces good answers and reduces bad answers, which is why the process is called Reinforcement Learning from Human Feedback
...more