If you are an aspiring data scientist who wants to learn data science and numerical programming concepts through hands-on, real-world project examples, this is the book for you. Whether you are brand new to data science or you are a seasoned expert, you will benefit from learning about the structure of data science projects, the steps in the data science pipeline, and the programming examples presented in this book. Since the book is formatted to walk you through the projects with examples and explanations along the way, no prior programming experience is required.
Tony Ojeda is an accomplished data scientist and entrepreneur with expertise in business process optimization and over a decade of experience creating and implementing innovative data products and solutions. He has a Masters in Finance from Florida International University and an MBA with concentrations in Strategy and Entrepreneurship from DePaul University. He is the founder of District Data Labs, a co-founder of Data Community DC, and is actively involved in promoting data science education through both organizations.
[Full disclosure this review is based on a free review copy!] When I first started getting in to programming I became very interested in data science. I dabbled a little in R and played a lot with Python trying to find what would fit me best. I was trying to get my head around a subject that I find fascinating and had I had access to The Practical Data Science Cookbook I would have had a much better idea of what I wanted to pursue.
The very first recipe in the book explains the data science pipeline:
and talks about how this isn't necessarily a linear process. This is our starting point and provides the structure of the chapters. Unlike many cookbooks this seems much more like a tutorial than a collection of recipes. The book is split by language, the first part covers the more statistically minded R and the second the more code oriented Python.
The four chapters, following the setup chapter, are recipes for use with R. To be honest I didn't work through each recipe in detail as I am not a great fan of R (it's the damned syntax!). I do, however, have enough R knowledge to go through and understand the material. The recipes are well laid out and interesting allowing you to work with data from various areas (including the NFL and the stock market) and how to actually use the data.
Python gets slightly more coverage with six chapters although the second uses the same data as the first R recipes. I found the python recipes to be a lot of fun perhaps because they are more application oriented. We not only work with the data to get answers but also how to share and make use of those answers.
The best chapter, at least for me, is almost certainly chapter 8 where we analyse social graphs. Specifically social graphs comprised of Marvel superheroes! Here we find the connections between heroes based on shared appearances in comics. Each recipe builds your understanding of social networks in general while being specific enough to generate interesting data.
The book isn't without its flaws -- some errors in the code samples (easy enough to catch) or at times being slightly repetitive -- but all in all it is an interesting read and nice introduction to data science.
The authors have done their best to pick interesting projects and they have done well. They offer insights in to many possible avenues of investigation and is a great place for beginners to start. Each chapter might not be for everyone but anyone interested in data science will find something that will spark their interest.
If you're new to data science, this book is great for you; each chapter is distinct and can be studied independently. Worried about what programming language to learn - R vs. Python? Don't worry, this book has both!
This is an excellent guide to users new to R (and Python). It assumes some technical knowledge, but you don't have to be an expert in any one technology to use it. Instructions are easy to follow and both the input and results are explained in depth. The book is well-organized with clear titles and subtitles, summaries that explain the exercise that was completed, and good use of bulleted lists and bolded text to call out important terms and excerpts of code. This is not a theory book, but there is enough explanation here to make the exercises meaningful. Text is interspersed with graphics so that this is easy to follow and comprehend.
A great introduction to the data science pipeline through several case studies, highlighting different features of the R and Python analytic ecosystems
This is another good book for the aspiring data scientist who needs to get actual work done while learning on the job. It includes recipes for munging, wrangling, processing, and visualizing data.