Mark Gerstein

32%
Flag icon
One common way to do this is by using a softmax policy, in which we choose actions with a probability that is proportional to their value relative to all the other actions. In this first trial, since the values are all the same, each of the actions would have a 25% likelihood of being chosen. Suppose that we randomly choose machine 2 on the first trial, and we happen to win $1 (which for that machine happens 40% of the time). The next job of the model is to update its value estimates based on that experience—actually, based on how our experience differs from our expectation. In this case, the ...more
Hard to Break: Why Our Brains Make Habits Stick
Rate this book
Clear rating
Open Preview