In the inverse reinforcement-learning approach, a core idea is that the AI is trying to maximize not the goal-satisfaction of itself, but that of its human owner. It therefore has an incentive to be cautious when it’s unclear about what its owner wants, and to do its best to find out. It should also be fine with its owner switching it off, since that would imply that it had misunderstood what its owner really wanted.