Analyzing Baseball Data with R Second Edition introduces R to sabermetricians, baseball enthusiasts, and students interested in exploring the richness of baseball data. It equips you with the necessary skills and software tools to perform all the analysis steps, from importing the data to transforming them into an appropriate format to visualizing the data via graphs to performing a statistical analysis.
The authors first present an overview of publicly available baseball datasets and a gentle introduction to the type of data structures and exploratory and data management capabilities of R. They also cover the ggplot2 graphics functions and employ a tidyverse-friendly workflow throughout. Much of the book illustrates the use of R through popular sabermetrics topics, including the Pythagorean formula, runs expectancy, catcher framing, career trajectories, simulation of games and seasons, patterns of streaky behavior of players, and launch angles and exit velocities. All the datasets and R code used in the text are available online.
New to the second edition are a systematic adoption of the tidyverse and incorporation of Statcast player tracking data (made available by Baseball Savant). All code from the first edition has been revised according to the principles of the tidyverse. Tidyverse packages, including dplyr, ggplot2, tidyr, purrr, and broom are emphasized throughout the book. Two entirely new chapters are made possible by the availability of Statcast data: one explores the notion of catcher framing ability, and the other uses launch angle and exit velocity to estimate the probability of a home run. Through the book's various examples, you will learn about modern sabermetrics and how to conduct your own baseball analyses.
Max Marchi is a Baseball Analytics Analyst for the Cleveland Indians. He was a regular contributor to The Hardball Times and Baseball Prospectus websites and previously consulted for other MLB clubs.
Jim Albert is a Distinguished University Professor of statistics at Bowling Green State University. He has authored or coauthored several books including Curve Ball and Visualizing Baseball and was the editor of the Journal of Quantitative Analysis of Sports.
Ben Baumer is an assistant professor of statistical & data sciences at Smith College. Previously a statistical analyst for the New York Mets, he is a co-author of The Sabermetric Revolution and Modern Data Science with R.
I have a dog-eared, beaten up copy of the first edition. I've gone through it basically cover to cover twice, and used it many times as a reference. I believe this book is now on a second edition, which I haven't seen.
The book is geared towards people who are new to R, but I think even novice to intermediate level users of R will learn from the examples as well.
In addition to helping you learn R, you will learn where various data sources are for baseball info, and how to access it with some real examples. The book shows how to access each of: The Lahman Database (Season-by-Season data), Retrosheet data (game logs and play-by-play data which has outcomes of every pitch of every game for the last 100+ years) and Pitch f/x data (detailed data on every pitch since pitch f/x technology became available, showing things like velocity, movement, and location).
I think even if you are only a casual baseball fun, the book would be a fun way to learn R using real world examples and data.
R, matey! Haha jk, not really about the Pittsburgh Pirates! To be completely transparent, the book really gets in the weeds... and not in the good way!