In Chapter 6 we saw that an algorithm might win a prediction competition by a very small margin. When predicting the survival of the Titanic test set, for example, the simple classification tree achieved the best Brier score (average mean squared prediction error) of 0.139, only slightly lower than the score of 0.142 from the averaged neural network (see Table 6.4). It is reasonable to ask whether this small winning margin of −0.003 is statistically significant, in the sense of whether or not it could be explained by chance variation. This is straightforward to check, and the t-statistic turns
...more