Reinforcement learners solve this by sometimes choosing the best action and sometimes a random one. (The brain even seems to have a “noise generator” for this purpose.) Early on, when there’s much to learn, it makes sense to explore a lot. Once you know the territory, it’s best to concentrate on exploiting