Debbie Roth

21%
Flag icon
The signal on which the actor learns is not rewards, per se, but the temporal difference in the predicted reward from one moment in time to the next. Hence Sutton’s name for his method: temporal difference learning.
A Brief History of Intelligence: Evolution, AI, and the Five Breakthroughs That Made Our Brains
Rate this book
Clear rating
Open Preview