Goodreads helps you keep track of books you want to read.
Start by marking “Python for Data Analysis: Data Wrangling with Pandas, Numpy, and Ipython” as Want to Read:
Python for Data Analysis: Data Wrangling with Pandas, Numpy, and Ipython
Enlarge cover
Rate this book
Clear rating
Open Preview

Python for Data Analysis: Data Wrangling with Pandas, Numpy, and Ipython

4.02 of 5 stars 4.02  ·  rating details  ·  384 ratings  ·  42 reviews
"Python for Data Analysis" is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you'll need to effectively solve a broad set of data ...more
ebook, 470 pages
Published October 8th 2012 by O'Reilly Media (first published December 30th 2011)
more details... edit details

Friend Reviews

To see what your friends thought of this book, please sign up.
This book is not yet featured on Listopia. Add this book to your favorite list »

Community Reviews

(showing 1-30 of 1,256)
filter  |  sort: default (?)  |  rating details
A better title for this book might be Pandas and NumPy in Action

As the creator of the pandas project, a Python data analysis framework, Wes McKinney is well placed to write this book. His experience and vision for the pandas framework is clear, and he is able to explain the main function and inner workings of both pandas and another package, NumPy, very well.

Although the title of the book suggests a broad look at the Python language for data analysis, McKinney almost exclusively focuses on an in
For some time now I have been using R and Python for data analysis. And I have long ago discovered the Python technical stack of ipython, NumPy, Scipy, and Matplotlib and I thought I knew what I was doing. I even dipped my toe into pandas as my data structure for analysis. But Python for Data Analysis showed me entire worlds of improvement in my workflow and my ability to work with data in the messy form that is found in the real world.

Python, like most interpreted languages, is slow compared to
Aug 24, 2012 Rob rated it 3 of 5 stars  ·  review of another edition
Recommends it for: folks doing data analysis that have already decided to use Python
Shelves: 2012, technical, python
I did copy editing on this book, so my review is of an unfinished (but close to finished) version. That being said: McKinney is the principal author on pandas, a Python package for doing data transformation and statistical analysis. The book is largely about pandas (and NumPy), but also delves into general methodologies for munging data and performing analytical operations on them (e.g., normalizing messy data and turning it into graphs and tables); he also delves into some (semi) esoteric infor ...more
John Alan
The book focuses on Pandas, but also introduces you to the ecosystem of libraries you'll encounter when doing scientific data analysis in Python.

As well as Pandas you'll cover IPython, NumPy and Matplotlib in enough depth to get you started with data analysis and visualization.

You don't need to be a python expert but some python knowledge, and some experience of R, will definitely help.

The book is well structured, breaking down the different topics into well defined chapters which deal with topi
James Williams
In my office, we spend a lot of time in the database. As such, we tend to become fairly adept at analyzing data with SQL: join some tables on interesting columns, group by other interesting columns, sprinkle in some aggregates, and pretty soon you have yourself a table of answers.

The relatively new windowing functions added in SQL Server 2012 let you do even fancier analysis (at the risk needing to understand some new syntax).

Yet, sometimes, a raw table of SQL results just isn't enough. You mi
Good introduction to pandas data analysis library by its main contributor, Wes McKinney. Also covers useful Python tools/libraries for data analysis such as ipython and numpy. Lots of examples.

Didn't read the last three chapters on time series, financial data analysis and advanced numpy.

Ipython notebooks are available here, forked from the official repository of the book.
Good introduction to Python Pandas and other libraries for data analysis. However, the book goes directly from the introduction into pretty complicated examples. As a reader new to R, Pandas, and statistical languages, it was hard work to learn the data structures and semantics. After working through several web-based tutorials, I had a better intuitive sense for how to solve problems with the framework presented by the author.

As documentation for Pandas alone, this book is useful.
Derek Bridge
This book is a reasonably comprehensive tutorial to pandas - the Python library for data wrangling. As a tutorial, it works well.

But it wasn't quite what I was expecting. I was expecting less tutorial and more case studies - taking meaningful datasets (instead of makey-upy ones) and using pandas and other tools to pose and answer questions. For me, this would have made the book a much more practical resource.
idle sign
Эту книгу, конечно, можно упрекать: и за слишком пафосное название (автор книги — человек, создавший pandas — мог бы сразу сказать, что она в основном о pandas); и за стиль изложения, делающий книгу почти, но не справочником; и за нелепое введение в Питон в конце книги; и за отсутствие привязки большинства примеров к реальной жизни; и за рваное повествование.

Упрекать можно, но стоит помнить: pandas де-факто на текущий момент нет альтернатив в области анализа данных на Python, да и инструмент это
Dani Arribas-bel
I've been using Python for quantitaive economics and geography for the last five years and I wish this book had been written back then. It is clear, easy to read, full of (cool) examples and incredibly useful, regardless of whether you read it from first to last page or have it around to check while hacking. If you are getting started on data crunching with Python, you should not miss it. If you are not new to the party, this will still be a good resource to get your head around a bit more advan ...more
I've used Python for a few years and have recently started implementing it at work. We used to have Matlab for basic scripting, plotting, and analysis and recently dropped it due to their ridiculous licensing costs. Some people moved on to Scilab successfully (it's a great solution) but I started using Python in lieu of Matlab.

This book is a great introduction to pandas (it's written by the main author of pandas) as well as an introduction to Numpy. Great read.
Nancy Wu
This book was the perfect set of training wheels for me, especially since my main goal was to operate on economic and financial data. By chapter 4 (practically the beginning of this book), I was able to sample random stocks, run correlations between stocks and commodities. I think that the TimeSeries chapter should be read just before or after chapter 4, to avoid some time groping in the dark with this datatype. Chapter 11 is also very useful with a focus on data munging for financial data.
Matt Heavner
This is definitely pandas and financial analysis focused (by the author's own admission). It is filled with lots of great examples (it doesn't matter so much if they are scientific time series or financial time series, but lots of the focus on "business days" and "last day of the quarter" aren't so applicable to scientific data sets -- but they do still have illustrative value). There were a few moments when the overall level "dipped" (one I remember specifically was the "hangup" early on regard ...more
É um livro de referência pra Data Analysis. Basicamente tem um capítulo pra cada uma das principais aplicações e vai mostrando o que o ecossistema de Python te oferece pra fazer data wrangling. Se tu nunca fez, não é com esse livro que vai aprender. Mas assumindo que já sabe, tu já consegue usar IPython, NumPy e pandas pra resolver problemas.
I did not really use this book much as I ended up primarily using a matlab - python cheat sheet along with google searches to get up and running with Python. The book does have some useful information specific to large dataset manipulation that I can foresee being helpful in future projects.
Raymond Lim
Good for familiarizing with python tools numpy, matplotlib, and pandas.
But not that great for applying the tools to do any analysis. Though it did have a few examples of analysis using the tools.
I will still keep the book around for reference as I get more familiar w/ pandas.
Rishi Singh
It's a good book and great for understanding Wes's original intent on how to use Pandas. I recommend any serious pandas or data analysis user to have this on their bookshelf. This is much better than the pandas docs for learning about the bigger picture and understanding how to connect the pieces of Pandas.

My only major issue is that the content will become more outdated with each passing edition. Pandas is rapidly developing. This is an unfair reason to remove a star, the second reason is the b
Tristan Williams
A great handbook for anyone looking to do break down data sets in Python. This won't teach you what to look for or how to do data analysis, but it will show you all the tools to get it done.
Better name: 'Data munging with Python's pandas and numpy'

It focuses heavily on pandas and the myriad of things you can do with a DataFrame. Very often the examples are extremely specific, yet the example data is contrived, like "here's this rather specific case in which you want to average a subset of a column in a table, but only those cases where the person linked to the index of that column has the astrological sign Pisces", and about 5 minutes later, I already forgot how the author did it.
Pandas has advanced since the book, so I'm looking forward to vol2.

The book itself is very terse it's a reference more than a how-to.
Excellent instruction and reference for Data wrangling in Python. Made dealing with large unwieldy datasets infinitely easier!

Note: This book will not teach you statistics, machine learning, etc, but it is an invaluable reference for getting to a point where you can perform these analyses.
Want to use Python instead of Excel, Matlab, or R? Are you crunching numbers in Python? Want to learn about IPython, NumPy, Matplotlib, and pandas? Then Wes McKinney's book is for you. Whenever I'm performing data analysis using Python, I find myself referencing Python for Data Analysis. It's well written and covers a broad range of topics that you'll need when importing, manipulating, aggregating, calculating, or plotting data.
Eliot St. John
What I really need is "Python for Stata users."
Rong Emily
Nice logical flow , easy to understand.
It's not a bad book but if you are looking for a good book for scientific computing with Python you will probably be disappointed.
The book covers mostly pandas and doesn't give much information on numpy and matplotlib, and say completely nothing about scipy, which are all more essential for scientific computing as far as I understand that topic.
On the other hand I'm sure that I will use what I've learned here soon, but only after reading more comprehensive information about the whole scipy stack
This is not a first book on data analysis or a learn to program in python book. But, if you already know how to program a little in python, and have experience with data analysis in other programming languages, this book will teach you exactly what you need to do the job in python.

It's clearly written and well-edited and organized, just like other O'Reilly books. Even if you have no time to learn this yourself, buy it for your lab so your grad students will be more productive. ;-)
M Sheik Uduman Ali
As a primer for data analysis, this book has been well written. With the enough sample data for learning purpose, Wes explains NumPy, Pandas and matplotlib libraries in addition to the knowledge required on file handling, data loading, storage, wrangling and aggregation.

Python for Data Analysis would be your first and good stepping stone...

Read fully review at:
Indispensable and quite comprehensive. Slightly out of date at this point as pandas has matured since its publication, but this book will undoubtedly be a reference returned to repeatedly. Easy-to-understand code examples, and, more importantly, explanation of why one would want to apply the various capabilities described by them.
This book is helpful IF you already use python..or are willing to run through a bunch of basic python learning beside learning how to use Python for data analysis. There are some examples, but they are not fully annotated. Wes McKinney has produced a useful package with pandas. I warn readers (especially newbie readers) to be very patient in working out examples and annotating them until you understand what is happening.
With this book, I am beginning to ditch R and dive into Pandas. This way, I can have both data conversion and analysis in one code.

It is certainly not for Python beginners. I got also stuck at some mind boggling syntax and financial jargons. But the beauty of the Pandas is still right there to see. It is only with Pandas that I dare to take on some projects that were considered to be too cumbersome to tackle before.
« previous 1 3 4 5 6 7 8 9 41 42 next »
There are no discussion topics on this book yet. Be the first to start one »
  • Data Analysis with Open Source Tools
  • Think Complexity: Complexity Science and Computational Modeling
  • Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites
  • Natural Language Processing with Python
  • Programming Collective Intelligence: Building Smart Web 2.0 Applications
  • Machine Learning for Hackers
  • Data Mining: Practical Machine Learning Tools and Techniques (Morgan Kaufmann Series in Data Management Systems)
  • Doing Data Science
  • The Art of R Programming: A Tour of Statistical Software Design
  • Interactive Data Visualization for the Web
  • R Cookbook
  • Machine Learning in Action
  • Python Cookbook
  • Python Essential Reference (Developer's Library)
  • Pattern Recognition and Machine Learning
  • The Elements of Statistical Learning: Data Mining, Inference, and Prediction
  • R in a Nutshell: A Desktop Quick Reference
  • Building Machine Learning Systems with Python

Goodreads is hiring!

If you like books and love to build cool products, we may be looking for you.
Learn more »
Pro Python Data Wrangling

Share This Book