Jump to ratings and reviews
Rate this book

Data Manipulation with R (Use R!) 1st Edition by Spector, Phil published by Springer Paperback

Rate this book
Since its inception, R has become one of the preeminent programs for statistical computing and data analysis. The ready availability of the program, along with a wide variety of packages and the supportive R community make R an excellent choice for almost any kind of computing task related to statistics. However, many users, especially those with experience in other languages, do not take advantage of the full power of R. Because of the nature of R, solutions that make sense in other languages may not be very efficient in R. This book presents a wide array of methods applicable for reading data into R, and efficiently manipulating that data. In addition to the built-in functions, a number of readily available packages from CRAN (the Comprehensive R Archive Network) are also covered. All of the methods presented take advantage of the core features of vectorization, efficient use of subscripting, and the proper use of the varied functions in R that are provided for common data management tasks. Most experienced R users discover that, especially when working with large data sets, it may be helpful to use other programs, notably databases, in conjunction with R. Accordingly, the use of databases in R is covered in detail, along with methods for extracting data from spreadsheets and datasets created by other programs. Character manipulation, while sometimes overlooked within R, is also covered in detail, allowing problems that are traditionally solved by scripting languages to be carried out entirely within R. For users with experience in other languages, guidelines for the effective use of programming constructs like loops are provided. Since many statistical modeling and graphics functions need their data presented in a data frame, techniques for converting the output of commonly used functions to data frames are provided throughout the book. Using a variety of examples based on data sets included with R, along with easily simulated data sets, the book is recommended to anyone using R who wishes to advance from simple examples to practical real-life data manipulation solutions.

Paperback

First published March 19, 2008

7 people are currently reading
52 people want to read

About the author

Phil Spector

2 books

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
8 (18%)
4 stars
20 (45%)
3 stars
15 (34%)
2 stars
1 (2%)
1 star
0 (0%)
Displaying 1 - 3 of 3 reviews
Profile Image for Louis.
227 reviews32 followers
July 10, 2015
The quality that programming language based data analysis environments have that menu driven or batch environments do not is the ability to manipulate data. That means transforming data into usable forms, but it also means cleaning data, manipulating text, transforming data formats, and extracting data from free text. While R falls into this category of data analysis environment, almost all of the available material focuses on the application of statistical methods in R. This fills a much needed niche in how to process data. I still do not regard R as my goto tool for data manipulation, but this book means I am more likely to stay in R than otherwise. I used this as a textbook in a lower division data analysis course and the class went from a group that only half remembers Matlab to being able to process and analyze fairly large datasets. A comment I received was "I looked back on the work done in this project and I cannot believe I actually did that!"

The first part of the book is reading in data and writing out results. It discusses both text (csv, delimited, fixed) and working with relational database. One note is that the database they use is MySQL. This was easily convertible to SQLite, which is what I used in my class because my students are not IT savvy. I also used supplementary material for SQL (which is readily available) Then putting things together into data frames.

Next are a series of data types: datetimes, factors, numbers. For people who have only worked in Excel, these are deal breakers. Even using Excel, these are areas that often go unnoticed by students and lead to problems.

Character manipulation is about working with strings and a gentle introduction to regular expressions. For many of my students, they have never manipulated text programmaticly before, so this chapter was quite successful. For Regular expressions, well it provided a taste of it, enough to solve the lab assignment. I supplemented it with other material, but noone was going to learn regular expressions in 5 pages.

The best part of the book was the sections on aggregating and reshaping data. This is what made what my students were doing with R start to look like magic. Aggregations using the apply family of functions, reshape to convert data into long or wide formats, combining data frames, and an introduction to vectorization. This is not going to make anyone a functional programmer, but these are key idioms and Spector spent a lot of time here.

I am not going to prefer R over Python for working with text and manipulating data, but Data Manipulation with R shows how to do some non-obvious things. The examples are all interesting enough to be useful, and they all work as is. And this goes deep enough into some pretty powerful capabilities that expanded my students understanding of what is possible. While it is becoming dated (an update would have to include dplyr), the approaches it provides put the reader well on their way to being an accomplished R programmer, not just someone who feeds data into functions.
Profile Image for Jule.
86 reviews9 followers
July 1, 2009
So far, this is the best manual I have come across for someone who needs guidance in learning R. This book serves as a handy reference. A must have I dare say - at least for peeps who can get as easily frustrated in R as I can.
Profile Image for Greg.
649 reviews105 followers
January 28, 2011
Really handy book. It contains lots of tips and tricks for working with data. It is especially handy for someone who is an old fashioned S user who has come to R as less than a noob, but not familiar with all the extensions added since the original Bell Labs distro of S.
Displaying 1 - 3 of 3 reviews

Can't find what you're looking for?

Get help and learn more about the design.