"Given a data set, you can fit thousands of models at the push of a button, but how do you choose the best? With so many candidate models, overfitting is a real danger. Is the monkey who typed Hamlet actually a good writer?" "Choosing a suitable model is central to all statistical work with data. Selecting the variables for use in a regression model is one important example. The past two decades have seen rapid advances both in our ability to fit models and in the theoretical understanding of model selection needed to harness this ability, yet this book is the first to provide a synthesis of research from this active field, and it contains much material previously difficult or impossible to find. In addition, it gives practical advice to the researcher confronted with conflicting results." Model choice criteria are explained, discussed and compared, including Akaike's information criterion AIC, the Bayesian information criterion BIC and the focused information criterion FIC. Importantly, the uncertainties involved with model selection are addressed, with discussions of frequentist and Bayesian methods. Finally, model averaging schemes, which combine the strengths of several candidate models, are presented.
I'm the author, with my colleague Gerda Claeskens from KU Leuven, Belgium, and hence rather biased. But I'm happy with this Cambridge University Press book, from 2008, and do think it has earned its firm place in the literature on statistical model selection and model averaging. We do the AIC, BIC, DIC, for the Akaike, the Bayesian, and the Deviance Information Criteria, and push Gerda's and my own FIC, the focused Information Criterion.
For exercises & stories, featuring model selection, see my FocuStat webpage and the University of Oslo, and also the course website for the course STK 4160, with supplementary exercises and other material.