expect means from two random samples to be very similar. We compare the difference between the sample means that we collected to the difference between the sample means that we would expect to obtain (in the long run) if there were no effect (i.e., if the null hypothesis were true).