Goodreads helps you keep track of books you want to read.
Start by marking “Python for Data Analysis: Data Wrangling with Pandas, Numpy, and IPython” as Want to Read:
Python for Data Analysis: Data Wrangling with Pandas, Numpy, and IPython
Enlarge cover
Rate this book
Clear rating
Open Preview

Python for Data Analysis: Data Wrangling with Pandas, Numpy, and IPython

4.04  ·  Rating Details ·  762 Ratings  ·  65 Reviews
"Python for Data Analysis" is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you'll need to effectively solve a broad set of data ...more
ebook, 466 pages
Published October 8th 2012 by O'Reilly Media (first published December 30th 2011)
More Details... edit details

Friend Reviews

To see what your friends thought of this book, please sign up.
This book is not yet featured on Listopia. Add this book to your favorite list »

Community Reviews

(showing 1-30)
filter  |  sort: default (?)  |  Rating Details
Sep 30, 2012 Ben rated it it was ok  ·  review of another edition
Shelves: python, data-analysis
A better title for this book might be Pandas and NumPy in Action

As the creator of the pandas project, a Python data analysis framework, Wes McKinney is well placed to write this book. His experience and vision for the pandas framework is clear, and he is able to explain the main function and inner workings of both pandas and another package, NumPy, very well.

Although the title of the book suggests a broad look at the Python language for data analysis, McKinney almost exclusively focuses on an in
Nov 09, 2012 Louis rated it it was amazing  ·  review of another edition
Shelves: computer, math-stats
For some time now I have been using R and Python for data analysis. And I have long ago discovered the Python technical stack of ipython, NumPy, Scipy, and Matplotlib and I thought I knew what I was doing. I even dipped my toe into pandas as my data structure for analysis. But Python for Data Analysis showed me entire worlds of improvement in my workflow and my ability to work with data in the messy form that is found in the real world.

Python, like most interpreted languages, is slow compared to
Aug 20, 2012 Rob rated it liked it  ·  review of another edition
Recommends it for: folks doing data analysis that have already decided to use Python
Shelves: technical, 2012, python
I did copy editing on this book, so my review is of an unfinished (but close to finished) version. That being said: McKinney is the principal author on pandas, a Python package for doing data transformation and statistical analysis. The book is largely about pandas (and NumPy), but also delves into general methodologies for munging data and performing analytical operations on them (e.g., normalizing messy data and turning it into graphs and tables); he also delves into some (semi) esoteric infor ...more
Good introduction to pandas data analysis library by its main contributor, Wes McKinney. Also covers useful Python tools/libraries for data analysis such as ipython and numpy. Lots of examples.

Didn't read the last three chapters on time series, financial data analysis and advanced numpy.

Ipython notebooks are available here, forked from the official repository of the book.
Derek Bridge
This book is a reasonably comprehensive tutorial to pandas - the Python library for data wrangling. As a tutorial, it works well.

But it wasn't quite what I was expecting. I was expecting less tutorial and more case studies - taking meaningful datasets (instead of makey-upy ones) and using pandas and other tools to pose and answer questions. For me, this would have made the book a much more practical resource.
Good introduction to Python Pandas and other libraries for data analysis. However, the book goes directly from the introduction into pretty complicated examples. As a reader new to R, Pandas, and statistical languages, it was hard work to learn the data structures and semantics. After working through several web-based tutorials, I had a better intuitive sense for how to solve problems with the framework presented by the author.

As documentation for Pandas alone, this book is useful.
Paweł Kacprzak
Just a more verbose documentation. After a promising introduction showing several real-world usages of data manipulation, the book is nothing more than a documentation of pandas and libraries like numpy and matplotlib. Moreover, many of functions described there are already deprecated, so just be aware of that. Perhaps the best way of "reading" this book is just scanning it quickly for a general overview of pandas functionalities, so it can be used as a point of reference when needed.
May 27, 2017 Mark rated it really liked it  ·  review of another edition
not really suited for a straight read through, but a comprehensive manual for many key techniques in the python data analysis toolboxes. will be a great reference for how to accomplish both basic and high level operations.

a new edition due in fall 2017 should correct some of the outdated material as well.
Jun 06, 2017 Gorjanz rated it it was ok  ·  review of another edition
Not enough details on anything... good only if you don't know Python, and you want to learn the language syntax via examples from a specific CS field...
John Alan
Dec 17, 2014 John Alan rated it really liked it  ·  review of another edition
The book focuses on Pandas, but also introduces you to the ecosystem of libraries you'll encounter when doing scientific data analysis in Python.

As well as Pandas you'll cover IPython, NumPy and Matplotlib in enough depth to get you started with data analysis and visualization.

You don't need to be a python expert but some python knowledge, and some experience of R, will definitely help.

The book is well structured, breaking down the different topics into well defined chapters which deal with topi
James Williams
Apr 25, 2014 James Williams rated it it was amazing  ·  review of another edition
Shelves: tech
In my office, we spend a lot of time in the database. As such, we tend to become fairly adept at analyzing data with SQL: join some tables on interesting columns, group by other interesting columns, sprinkle in some aggregates, and pretty soon you have yourself a table of answers.

The relatively new windowing functions added in SQL Server 2012 let you do even fancier analysis (at the risk needing to understand some new syntax).

Yet, sometimes, a raw table of SQL results just isn't enough. You mi
Lê Khánh
Feb 28, 2017 Lê Khánh rated it really liked it  ·  review of another edition
this book gives me a briefly vision about data science.
Adil Khashtamov
Frankly speaking, this book is not about data analysis, it is more about pandas as an instrument to do data analysis. Book also covers Ipython, numpy, matplotlib superficially.
The reason why I gave 2 stars is because it is little bit out-of-date and almost no practical examples during uncovering of pandas' functionality.
I would strongly recommend to dive into official documentation instead (10 minutes intro and tutorials) if you want to master pandas.
Joshua Hruzik
Feb 09, 2017 Joshua Hruzik rated it really liked it  ·  review of another edition
Great for Transition
As an R user I always hear people say that one should also learn Python as a secondary language. So I gave this book a shot and it did not disappoint.
Wes McKinney is the creator of Pandas, a framework for working with structured data in Python. That being said, a great deal of the book deals with Pandas and solving classical data science tasks (cleaning and munging your data). His style of writing is very clear and one can easily grasp the concepts by applying them in one's o
Dani Arribas-bel
I've been using Python for quantitaive economics and geography for the last five years and I wish this book had been written back then. It is clear, easy to read, full of (cool) examples and incredibly useful, regardless of whether you read it from first to last page or have it around to check while hacking. If you are getting started on data crunching with Python, you should not miss it. If you are not new to the party, this will still be a good resource to get your head around a bit more advan ...more
Matt Heavner
Jan 29, 2013 Matt Heavner rated it really liked it  ·  review of another edition
This is definitely pandas and financial analysis focused (by the author's own admission). It is filled with lots of great examples (it doesn't matter so much if they are scientific time series or financial time series, but lots of the focus on "business days" and "last day of the quarter" aren't so applicable to scientific data sets -- but they do still have illustrative value). There were a few moments when the overall level "dipped" (one I remember specifically was the "hangup" early on regard ...more
idle sign
Jan 11, 2015 idle sign rated it really liked it  ·  review of another edition
Эту книгу, конечно, можно упрекать: и за слишком пафосное название (автор книги — человек, создавший pandas — мог бы сразу сказать, что она в основном о pandas); и за стиль изложения, делающий книгу почти, но не справочником; и за нелепое введение в Питон в конце книги; и за отсутствие привязки большинства примеров к реальной жизни; и за рваное повествование.

Упрекать можно, но стоит помнить: pandas де-факто на текущий момент нет альтернатив в области анализа данных на Python, да и инструмент это
Rishi Singh
Nov 27, 2013 Rishi Singh rated it really liked it  ·  review of another edition
It's a good book and great for understanding Wes's original intent on how to use Pandas. I recommend any serious pandas or data analysis user to have this on their bookshelf. This is much better than the pandas docs for learning about the bigger picture and understanding how to connect the pieces of Pandas.

My only major issue is that the content will become more outdated with each passing edition. Pandas is rapidly developing. This is an unfair reason to remove a star, the second reason is the b
Feb 09, 2017 TofurkyVectrex64 rated it really liked it  ·  review of another edition
Terrible book title. This is just a manual for pandas and numpy. Great reference book but not really a "If you want to get into data analysis" book.
Aug 03, 2014 Philipp rated it liked it  ·  review of another edition
Shelves: programming
Better name: 'Data munging with Python's pandas and numpy'

It focuses heavily on pandas and the myriad of things you can do with a DataFrame. Very often the examples are extremely specific, yet the example data is contrived, like "here's this rather specific case in which you want to average a subset of a column in a table, but only those cases where the person linked to the index of that column has the astrological sign Pisces", and about 5 minutes later, I already forgot how the author did it.
Aug 30, 2013 Tom rated it liked it  ·  review of another edition
Shelves: python, scientific
It's not a bad book but if you are looking for a good book for scientific computing with Python you will probably be disappointed.
The book covers mostly pandas and doesn't give much information on numpy and matplotlib, and say completely nothing about scipy, which are all more essential for scientific computing as far as I understand that topic.
On the other hand I'm sure that I will use what I've learned here soon, but only after reading more comprehensive information about the whole scipy stack
Katherine Ranney
Good book for data programming and analysis using Python. It's true there is heavy focus on Pandas and NumPy, but this seems reasonable given they remain the predominant libraries for data and numerical work in Python. The book felt to me like a sufficiently general overview, and an efficient one, since Wes McKinney was the principal author of Pandas, and his writing style is straightforward.
Feb 06, 2014 Nancy rated it it was amazing  ·  review of another edition
Shelves: reference
This book was the perfect set of training wheels for me, especially since my main goal was to operate on economic and financial data. By chapter 4 (practically the beginning of this book), I was able to sample random stocks, run correlations between stocks and commodities. I think that the TimeSeries chapter should be read just before or after chapter 4, to avoid some time groping in the dark with this datatype. Chapter 11 is also very useful with a focus on data munging for financial data.
Mar 03, 2013 Grace rated it it was amazing  ·  review of another edition
Shelves: reference, computing
This is not a first book on data analysis or a learn to program in python book. But, if you already know how to program a little in python, and have experience with data analysis in other programming languages, this book will teach you exactly what you need to do the job in python.

It's clearly written and well-edited and organized, just like other O'Reilly books. Even if you have no time to learn this yourself, buy it for your lab so your grad students will be more productive. ;-)
May 11, 2014 Michael rated it really liked it  ·  review of another edition
I've used Python for a few years and have recently started implementing it at work. We used to have Matlab for basic scripting, plotting, and analysis and recently dropped it due to their ridiculous licensing costs. Some people moved on to Scilab successfully (it's a great solution) but I started using Python in lieu of Matlab.

This book is a great introduction to pandas (it's written by the main author of pandas) as well as an introduction to Numpy. Great read.
Jul 21, 2013 Dgg32 rated it it was amazing  ·  review of another edition
With this book, I am beginning to ditch R and dive into Pandas. This way, I can have both data conversion and analysis in one code.

It is certainly not for Python beginners. I got also stuck at some mind boggling syntax and financial jargons. But the beauty of the Pandas is still right there to see. It is only with Pandas that I dare to take on some projects that were considered to be too cumbersome to tackle before.
Jonathan Dinu
It is a great book that covers great content however it is woefully out of date. I would give it 4 stars on content and approach but since pandas evolved/is evolving very rapidly many of the examples in the book no longer work, thus the 2 star rating.

Thankfully much of what it covers is contained in the pandas official documentation for all version of the library and if supplemented in this way the book is
Jan 01, 2014 Matthew rated it it was amazing  ·  review of another edition
Shelves: computer
Want to use Python instead of Excel, Matlab, or R? Are you crunching numbers in Python? Want to learn about IPython, NumPy, Matplotlib, and pandas? Then Wes McKinney's book is for you. Whenever I'm performing data analysis using Python, I find myself referencing Python for Data Analysis. It's well written and covers a broad range of topics that you'll need when importing, manipulating, aggregating, calculating, or plotting data.
Sep 04, 2015 Xinyu rated it really liked it  ·  review of another edition
I pass the chapters on time series. Overall, this is a good book on pandas. Pandas used to give me an impression that the grammar is kinda messy, but this book kind of explains why it is designed like that. Therefore, after reading this book, I get a much better understanding of pandas and even feel very convenient using pandas.

Will definitely put this book at my side for a while for reference.
Jun 03, 2013 Jess rated it liked it  ·  review of another edition
This book is helpful IF you already use python..or are willing to run through a bunch of basic python learning beside learning how to use Python for data analysis. There are some examples, but they are not fully annotated. Wes McKinney has produced a useful package with pandas. I warn readers (especially newbie readers) to be very patient in working out examples and annotating them until you understand what is happening.
« previous 1 3 4 5 6 7 8 9 next »
There are no discussion topics on this book yet. Be the first to start one »
  • Think Complexity: Complexity Science and Computational Modeling
  • Data Analysis with Open Source Tools
  • Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites
  • Programming Collective Intelligence: Building Smart Web 2.0 Applications
  • Natural Language Processing with Python
  • The Art of R Programming: A Tour of Statistical Software Design
  • Machine Learning for Hackers
  • Data Mining: Practical Machine Learning Tools and Techniques (Morgan Kaufmann Series in Data Management Systems)
  • Interactive Data Visualization for the Web
  • Doing Data Science
  • Machine Learning in Action
  • Python Cookbook
  • R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics
  • Python Essential Reference (Developer's Library)
  • Building Machine Learning Systems with Python
  • Pattern Recognition and Machine Learning
  • Data Science from Scratch: First Principles with Python
  • ggplot2: Elegant Graphics for Data Analysis

Goodreads is hiring!

If you like books and love to build cool products, we may be looking for you.
Learn more »

Share This Book

“Act without doing; work without effort. Think of the small as large and the few as many. Confront the difficult while it is still easy; accomplish the great task by a series of small acts. — Laozi” 0 likes
More quotes…