Jump to ratings and reviews
Rate this book

Data Manipulation with R

Rate this book
The R language provides a rich environment for working with data, especially data to be used for statistical modeling or graphics. Coupled with the large variety of easily available packages, it allows access to both well-established and experimental statistical techniques. However techniques that might make sense in other languages are often very ine?cient in R, but, due to R’s ?- ibility, it is often possible to implement these techniques in R. Generally, the problem with such techniques is that they do not scale properly; that is, as the problem size grows, the methods slow down at a rate that might be unexpected. The goal of this book is to present a wide variety of data - nipulation techniques implemented in R to take advantage of the way that R works,ratherthandirectlyresemblingmethodsusedinotherlanguages. Since this requires a basic notion of how R stores data, the ?rst chapter of the book is devoted to the fundamentals of data in R. The material in this chapter is a prerequisite for understanding the ideas introduced in later chapters. Since one of the ?rst tasks in any project involving data and R is getting the data into R in a way that it will be usable, Chapter 2 covers reading data from a variety of sources (text ?les, spreadsheets, ?les from other programs, etc. ), as well as saving R objects both in native form and in formats that other programs will be able to work with.

164 pages, Paperback

First published March 19, 2008

7 people are currently reading
52 people want to read

About the author

Phil Spector

2 books

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
8 (18%)
4 stars
20 (45%)
3 stars
15 (34%)
2 stars
1 (2%)
1 star
0 (0%)
Displaying 1 - 3 of 3 reviews
Profile Image for Louis.
226 reviews30 followers
July 10, 2015
The quality that programming language based data analysis environments have that menu driven or batch environments do not is the ability to manipulate data. That means transforming data into usable forms, but it also means cleaning data, manipulating text, transforming data formats, and extracting data from free text. While R falls into this category of data analysis environment, almost all of the available material focuses on the application of statistical methods in R. This fills a much needed niche in how to process data. I still do not regard R as my goto tool for data manipulation, but this book means I am more likely to stay in R than otherwise. I used this as a textbook in a lower division data analysis course and the class went from a group that only half remembers Matlab to being able to process and analyze fairly large datasets. A comment I received was "I looked back on the work done in this project and I cannot believe I actually did that!"

The first part of the book is reading in data and writing out results. It discusses both text (csv, delimited, fixed) and working with relational database. One note is that the database they use is MySQL. This was easily convertible to SQLite, which is what I used in my class because my students are not IT savvy. I also used supplementary material for SQL (which is readily available) Then putting things together into data frames.

Next are a series of data types: datetimes, factors, numbers. For people who have only worked in Excel, these are deal breakers. Even using Excel, these are areas that often go unnoticed by students and lead to problems.

Character manipulation is about working with strings and a gentle introduction to regular expressions. For many of my students, they have never manipulated text programmaticly before, so this chapter was quite successful. For Regular expressions, well it provided a taste of it, enough to solve the lab assignment. I supplemented it with other material, but noone was going to learn regular expressions in 5 pages.

The best part of the book was the sections on aggregating and reshaping data. This is what made what my students were doing with R start to look like magic. Aggregations using the apply family of functions, reshape to convert data into long or wide formats, combining data frames, and an introduction to vectorization. This is not going to make anyone a functional programmer, but these are key idioms and Spector spent a lot of time here.

I am not going to prefer R over Python for working with text and manipulating data, but Data Manipulation with R shows how to do some non-obvious things. The examples are all interesting enough to be useful, and they all work as is. And this goes deep enough into some pretty powerful capabilities that expanded my students understanding of what is possible. While it is becoming dated (an update would have to include dplyr), the approaches it provides put the reader well on their way to being an accomplished R programmer, not just someone who feeds data into functions.
Profile Image for Jule.
86 reviews9 followers
July 1, 2009
So far, this is the best manual I have come across for someone who needs guidance in learning R. This book serves as a handy reference. A must have I dare say - at least for peeps who can get as easily frustrated in R as I can.
Profile Image for Greg.
649 reviews105 followers
January 28, 2011
Really handy book. It contains lots of tips and tricks for working with data. It is especially handy for someone who is an old fashioned S user who has come to R as less than a noob, but not familiar with all the extensions added since the original Bell Labs distro of S.
Displaying 1 - 3 of 3 reviews

Can't find what you're looking for?

Get help and learn more about the design.