First, what is “information”? Shannon identified information with surprise. He chose the negative of the log of the probability of an event as the amount of information you get when the event of probability p happens. For example, if I tell you it is smoggy in Los Angeles, then p is near 1 and that is not much information, but if I tell you it is raining in Monterey in June, then that is surprising and represents more information. Because log 1 = 0, the certain event contains no information.

