Effectively Access, Transform, Manipulate, Visualize, and Reason about Data and ComputationData Science in A Case Studies Approach to Computational Reasoning and Problem Solving illustrates the details involved in solving real computational problems encountered in data analysis. It reveals the dynamic and iterative process by which data analysts approach a problem and reason about different ways of implementing solutions. The book’s collection of projects, comprehensive sample solutions, and follow-up exercises encompass practical topics pertaining to data processing, Non-standard, complex data formats, such as robot logs and email messagesText processing and regular expressionsNewer technologies, such as Web scraping, Web services, Keyhole Markup Language (KML), and Google EarthStatistical methods, such as classification trees, k-nearest neighbors, and naïve BayesVisualization and exploratory data analysisRelational databases and Structured Query Language (SQL)SimulationAlgorithm implementationLarge data and efficiencySuitable for self-study or as supplementary reading in a statistical computing course, the book enables instructors to incorporate interesting problems into their courses so that students gain valuable experience and data science skills. Students learn how to acquire and work with unstructured or semistructured data as well as how to narrow down and carefully frame the questions of interest about the data.Blending computational details with statistical and data analysis concepts, this book provides readers with an understanding of how professional data scientists think about daily computational tasks. It will improve readers’ computational reasoning of real-world data analyses.
Exactly what the title says: case studies in data analysis using the R statistical computing environment.
Unlike most collections of case-studies, the cases here are well-chosen, and complement each other to highlight different analytic techniques. Many that appear dry and uninformative (e.g., simulation of a branching process) actually prove to be the most interesting due to the stepwise-refinement approach taken by the authors. Each case begins with a "Computational Topics" section which lists the R programming techniques that will be covered in detail.
The contributed chapters are the weakest (which is surprising, given that one of the contributors is Hadley Wickham, whose own books are excellent) and seem to lack the big-picture vision of the primary authors.