This is the companion site to the electronic textbook, Introduction to Data Science, by Jeffrey Stanton. This book provides non-technical readers with a gentle introduction to essential concepts and activities of data science. For more technical readers, the book provides explanations and code for a range of interesting applications using the open source R language for statistical computing and graphics.
This book fills a nice little niche, people getting started in data analytics. The first problem people have in learning this is they tend to learn a number of techniques, but in practice they cannot get past the step of accessing and preparing the data. This book covers this and gives good practice in it. I plan on using this as the first of two texts for the course. This book will get them started (gently) in R and accessing and processing data, then a case based data mining book where they can use what they learn here to work with larger data sets that are then analyzed in the methods that the other book goes into more depth.
This book would be for those who are just getting started and need some hand holding as they get started with the R environment as well as those who don't really know where to get started with new data sources. It is most useful for those who are starting from nearly scratch, but also for those whose education included data analysis techniques, but whose training neglected those crucial steps on how to get started and why you should be using some class of method, not just the how.
Not a bad read. A few typos, some slight hiccups with the processes detailed (you need some authorization info to pull data from Twitter and the chapter that talks about how to do this doesn't mention it at all; though this may be a feature that was implemented after the book was released). It was a textbook for a class and it was easy to follow along with and often more informative than the class material itself; as it explained how functions and syntax worked and the prof did not after the first few weeks. A good starter guide for users new to R.
This is a really good (free) introduction to R. It covers basic statistical concepts but its real focus is on how to apply those concepts in R. While going through the exercises, Twitter changed it authentication protocol and through the power of Google Search I was able to figure it out. Hopefully a new edition will walk the reader through this as it was not trivial.
Regardless, this is a good read for anyone interested in R.
This book may cover the right material, but it's not an adequate or authoritative introduction to the field of data science.
For a book about information it is poorly conceived and organized. Chapters jump from broad topics to very specific technologies. For instance, there are two separate chapters about Apache software, but not the broader context of why these tools are powerful. There are far too many quotes used as filler content and the author drifts carelessly between topics.