The NCAA Bracket: Checking Our Work

Our NCAA tournament forecasts are probabilistic. You could say that FiveThirtyEight is “calling” for Michigan State to defeat Virginia on Friday, but it isn’t much of a call. Our model gives Michigan State a 50.7 percent chance of winning, and Virginia a 49.3 percent chance. For all intents and purposes, it’s a toss-up.

Other times, of course, one team has a much clearer edge. Duke was a 92.9 percent favorite against Mercer last week, but Mercer won.

Still, upsets like these are supposed to happen some of the time. The question in evaluating a probabilistic forecast is whether the underdogs are winning substantially more or less often than expected. The technical term for this is calibration. But you might think of it more as truth in advertising. Over the long run, out of all the times when we say a team is a 75 percent favorite, is it really winning about 75 percent of the time?

FiveThirtyEight’s NCAA tournament projections have been published each year since 2011. The formula has changed very little over that period. (The only substantive change has been adding a fifth computer power rating, ESPN’s Basketball Power Index, this season.) That gives us a reasonable baseline for evaluation: a total of 254 games, counting the 52 played so far this year. (These totals include “play-in” games.)

You can find a file containing our past predictions

 

Not all the numbers work out so neatly, however. Another bin contains all those favorites with win probabilities in the 50s (anywhere from 50.0 percent to 59.9 percent). These teams were supposed to win about 55 percent of the time. In fact, they’ve won 38 of 63 games, or 60.3 percent of the time. So the teams performed a little better than expected in these games.

By contrast, teams with win probabilities in the 60s (from 60.0 percent to 69.9 percent) have won 35 of their 60 games, or 58.3 percent. That’s a bit worse than expected.

What’s going on here? How are the somewhat heavier favorites, with win probabilities in the 60s, performing worse than those teams whose win probabilities were in the 50s? Are the heavier favorites getting cocky? Is there something wrong with the model?

Probably not. Instead, these differences are well within the ranges that might result from random chance. This is easier to explain visually, as in the following graphic, which portrays the results of games from each bin along with their confidence intervals.

 datalab-silver-ncaa-calibration-fixed

Each result is within its respective confidence interval. The calibration is not perfect. But the deviations from perfect calibration are not statistically significant. We encourage you to

 •  0 comments  •  flag
Share on Twitter
Published on March 27, 2014 08:24
No comments have been added yet.


Nate Silver's Blog

Nate Silver
Nate Silver isn't a Goodreads Author (yet), but they do have a blog, so here are some recent posts imported from their feed.
Follow Nate Silver's blog with rss.