Goodreads helps you keep track of books you want to read.
Start by marking “Python for Data Analysis” as Want to Read:
Python for Data Analysis
Enlarge cover
Rate this book
Clear rating
Open Preview

Python for Data Analysis

4.13  ·  Rating details ·  1,690 ratings  ·  131 reviews
Python for Data Analysis is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you'll need to effectively solve a broad set of data an ...more
Paperback, 400 pages
Published October 2012 by O'Reilly Media (first published December 30th 2011)
More Details... Edit Details

Friend Reviews

To see what your friends thought of this book, please sign up.

Community Reviews

Showing 1-30
Average rating 4.13  · 
Rating details
 ·  1,690 ratings  ·  131 reviews

More filters
Sort order
Start your review of Python for Data Analysis
Sep 30, 2012 rated it it was ok
Shelves: python, data-analysis
A better title for this book might be Pandas and NumPy in Action

As the creator of the pandas project, a Python data analysis framework, Wes McKinney is well placed to write this book. His experience and vision for the pandas framework is clear, and he is able to explain the main function and inner workings of both pandas and another package, NumPy, very well.

Although the title of the book suggests a broad look at the Python language for data analysis, McKinney almost exclusively focuses on an in
Sebastian Gebski
Apr 01, 2020 rated it it was amazing
It's hell-of-a-book & it took me a lot of time to get through, but it was worth it.

Two key points:
1. it's not time-consuming because it's hard to comprehend or something - quite the opposite, but it's very practical: examples, examples & examples, so it barely makes any sense to read it while not being in front of the keyboard (the check the stuff out)
2. people very differently understand terms like "data analysis", "artificial intelligence", "machine learning" & "data science" - this book is ab
Nov 09, 2012 rated it it was amazing
Shelves: math-stats, computer
For some time now I have been using R and Python for data analysis. And I have long ago discovered the Python technical stack of ipython, NumPy, Scipy, and Matplotlib and I thought I knew what I was doing. I even dipped my toe into pandas as my data structure for analysis. But Python for Data Analysis showed me entire worlds of improvement in my workflow and my ability to work with data in the messy form that is found in the real world.

Python, like most interpreted languages, is slow compared to
Paweł Kacprzak
May 13, 2017 rated it liked it
Just a more verbose documentation. After a promising introduction showing several real-world usages of data manipulation, the book is nothing more than a documentation of pandas and libraries like numpy and matplotlib. Moreover, many of functions described there are already deprecated, so just be aware of that. Perhaps the best way of "reading" this book is just scanning it quickly for a general overview of pandas functionalities, so it can be used as a point of reference when needed. ...more
Nov 18, 2012 rated it it was amazing
Good introduction to pandas data analysis library by its main contributor, Wes McKinney. Also covers useful Python tools/libraries for data analysis such as ipython and numpy. Lots of examples.

Didn't read the last three chapters on time series, financial data analysis and advanced numpy.

Ipython notebooks are available here, forked from the official repository of the book.
Terran M
Oct 20, 2018 rated it really liked it  ·  review of another edition
This book is a well-written, verbose introduction to Pandas by the main author of that library. Don't expect to learn much besides Pandas - matplotlib gets a brief mention, and there is a short Numpy section, but broadcasting is relegated to an appendix.

This book is a peer of Python Data Science Handbook by Jake VanderPlas, and they are more alike than different. They both start with long sections on manipulating data in Numpy and Pandas, on mostly made up examples of random numbers. This book i
Derek Bridge
Mar 03, 2013 rated it liked it
This book is a reasonably comprehensive tutorial to pandas - the Python library for data wrangling. As a tutorial, it works well.

But it wasn't quite what I was expecting. I was expecting less tutorial and more case studies - taking meaningful datasets (instead of makey-upy ones) and using pandas and other tools to pose and answer questions. For me, this would have made the book a much more practical resource.
A good, thorough introduction into using Python (and in particular the numpy and pandas libraries) for data analysis. As with pretty much al books of this kind, after a while the mixture of text and examples makes it hard to follow but then, maybe it's not supposed to be read while sitting down in an armchair. ...more
Minervas Owl
Jul 06, 2019 rated it really liked it  ·  review of another edition
Selected notes:

• pickle is only recommended as a short-term storage format. The problem is that it is hard to guarantee that the format will be stable over time; an object pickled today may not unpickle with a later version of a library.
• The map method on a Series accepts a function or dict-like object containing a mapping,
• Long/Wide reshaping can be done by pivot (long to wide), melt (wide to long), stack (wide to long), and unstack (long to wide).
• Pandas have a category type similar
Aug 20, 2012 rated it liked it
Recommends it for: folks doing data analysis that have already decided to use Python
Shelves: 2012, python, technical
I did copy editing on this book, so my review is of an unfinished (but close to finished) version. That being said: McKinney is the principal author on pandas, a Python package for doing data transformation and statistical analysis. The book is largely about pandas (and NumPy), but also delves into general methodologies for munging data and performing analytical operations on them (e.g., normalizing messy data and turning it into graphs and tables); he also delves into some (semi) esoteric infor ...more
Nov 26, 2020 rated it really liked it
The O'Reilly (animal) book that is the essential reference to pandas and numpy, as used in iPython and Jupyter notebooks. This book is a complete overview of the APIs and packages, hints and tips and some data sources for use with these first class data analysis tools.

Do you need this book? Maybe not. There is so much reference information on the web, I tend to just google it. Also it is not amazingly readable. The Open University course "Learn to code for data analysis" is a better introduction
Apr 13, 2013 added it
Shelves: programming
Good introduction to Python Pandas and other libraries for data analysis. However, the book goes directly from the introduction into pretty complicated examples. As a reader new to R, Pandas, and statistical languages, it was hard work to learn the data structures and semantics. After working through several web-based tutorials, I had a better intuitive sense for how to solve problems with the framework presented by the author.

As documentation for Pandas alone, this book is useful.
Feb 06, 2014 rated it it was amazing  ·  review of another edition
Shelves: reference
This book was the perfect set of training wheels for me, especially since my main goal was to operate on economic and financial data. By chapter 4 (practically the beginning of this book), I was able to sample random stocks, run correlations between stocks and commodities. I think that the TimeSeries chapter should be read just before or after chapter 4, to avoid some time groping in the dark with this datatype. Chapter 11 is also very useful with a focus on data munging for financial data.
Moeen Sahraei
Dec 04, 2020 rated it it was amazing  ·  review of another edition
It couldn’t have been better, A comprehensive book with a lot of details in data wrangling, it has been taught step by step so there is no confusion in figuring out the codes, the author explained the complex python subjects very intuitively so any one can read this book and learn data wrangling with some practice with data sets which book presented
runzhi xiao
Mar 06, 2019 rated it it was amazing
I started my career in data science with this book. The book is very easy to understand, and practical.Having a little bit of python knowledge would help you reading this book too. The book covered a lot of the jobs that a data analyst would do in daily job. It is mostly about pandas,numpy and matplotlib. So if you are already familiar with these tools, you can skip this one
Aug 14, 2019 rated it really liked it  ·  review of another edition
The book describes pandas: a Python library that supports data analysis and is also used in some Pyhton machine learning libraries.
The book also briefly mentions other libraries, including numpy and matplotlib.
All you read in this book is certainly available on the online documentation of the libraries discussed.
However the author does an excellent job at providing an accessible introduction to these libraries in a single place using a uniform terminology and paying attention to explaining conce
Apr 20, 2020 rated it it was ok  ·  review of another edition
Shelves: coding
This book will be a good reference book. However, it will be completely out of date in a couple of years. (Some examples in the book were already throwing errors or warnings.)

Working through this book was helpful, but it is more like manual pages than a cookbook (I was hoping for more of the latter). You get a lot of features thrown at you, but not a lot of context. Examples tend to be random numbers, which can be helpful to see what exact is going on with some method. There are a few short exa
Peter Backx
Feb 06, 2018 rated it really liked it  ·  review of another edition
Python for Data Analysis is a very thorough overview of, mostly, the Pandas library. There is also coverage of numpy, matplotlib and a tiny bit of some modeling libraries, such as patsy and scikit-learn.

The examples and flow are good and it's a joy to follow along in your own Jupyter Notebook. The chapter on time-data was a bit less engaging than the others. It did contain lots of code, but most of the code was very basic and not immediately applicable.

I read the book through Safari, which comes
Aaron Adamson
Oct 25, 2018 rated it really liked it
Pretty solid, but it really spends almost all of its time on the basics - intro to python, intro to numpy, intro to plotting, and then quite a bit of pandas. There's a lot of basic syntax, and just a survey of API features of these different modules.

That's useful, but honestly it serves more as an intro and reference. What I would've liked to have more of is what Chapter 13 barely touches on - statistical modeling and some of the non-trivial techniques you would actually use to solve problems.
Utsav Parashar
Mar 17, 2019 rated it really liked it  ·  review of another edition
After completing the book, I felt like I should have completed the book fast by quickly glancing at the topics and then should have referred on the topics as and when I worked on a problem where they can be used.
It takes you on a ride for data analysis using pandas, numpy and plots using seaborn and matplot lib.
Timeseries also has its value in the book.
At times, I felt the book is slow and felt like better examples would have kept me on pace.
Few topics like group_by, aggregate are designed and c
Nov 05, 2019 rated it liked it
A solid primer on numpy and pandas, the two heavy lifters in the data analysis python toolbox. Some people don’t like that this book sticks mostly to those two packages, but look, they get the job done. I like that the author digs in to those. I wish he had sprinkled more examples throughout the book to drive home the concepts right when he introduces them rather than sticking them in the back of the book all together. Overall, this is a great reference book that I expect to return to many times ...more
Gonzalo Fernández-Victorio
Jul 13, 2020 rated it really liked it  ·  review of another edition
Shelves: it, 2020
This book tries to explain how to use some of the more common tools when doing data analysis with Python. In my opinion, the book focus a little too much on showing the tools, as opposed to showing the process of data analysis and what tools to use. In that respect, the last chapter is really interesting because the examples are real. But as for the other chapters, one has the feeling that there is too much content on things that can be searched on the Internet, with examples that have little in ...more
Rustam Aliyev
Aug 05, 2017 rated it liked it
Shelves: computer-science
Very good book for beginner-intermediate Python programmers, it mainly focuses on data manipulation using pandas, numpy. I found many data manipulation examples to be useful but lacking details on context. Upper intermediate Python dev might miss performance details of various approaches of data analysis described in the book. Plotting section is shallow too focusing on pandas plotting rather than more general plotting mechanisms in Python using latest libraries.
Dec 17, 2019 marked it as to-keep-reference
Going into detail about the functionality of pandas is out of the scope of this book. However, Python for Data Analysis by Wes McKinney (O’Reilly, 2012) provides a great guide.

Introduction to Machine Learning with Python Pág.10
May 27, 2017 rated it really liked it
not really suited for a straight read through, but a comprehensive manual for many key techniques in the python data analysis toolboxes. will be a great reference for how to accomplish both basic and high level operations.

a new edition due in fall 2017 should correct some of the outdated material as well.
Nov 30, 2017 rated it it was ok
Covers a good amount of knowledge, interesting examples too.
But GOSH... the fact that python is a higher level language doesn't grant a free use of generic words and loose conceptual demarcations. This is a textbook-like read after all. Can we put in some more effort to be scientific here?

Also: get the up-to-date version.
Muhammad al-Khwarizmi
Apr 11, 2018 rated it liked it
Shelves: computation
This volume is a comprehensive guide to the use of (primarily) Pandas as well as other Python data analysis tools sometimes marred by unclear language and unusual or non-functioning example code. My review concerns only the first ten chapters but I would expect the quality of the remaining four to be about on par with the others.
Ferhat Culfaz
Sep 27, 2018 rated it it was amazing  ·  review of another edition
By the author and creator of Pandas. A must read book for anyone working in science, engineering, statistics, data science and machine learning. Covers all the feature engineering whet people spend most of their time. Also an excellent book for future reference to look stuff up. Clearly written with lots of practical examples and included Jupyter Notebooks.
Oct 26, 2018 rated it really liked it  ·  review of another edition
Great reference for pandas and its data science capabilities

The book is great. I find it to be more a reference (or a handbook) for pandas than a generic Data Science guide but I guess that’s its purpose. I find myself browsing it occasionally to see how to do something with pandas.
I think it’s a must for anyone using Python for Data Science or related tasks.
Kalyan Tirunahari
May 20, 2019 rated it really liked it
Shelves: coder, tech
Wonderful wonderful book. Starts from very basics of how to handle simple list data and progresses all the way to advanced Pandas and plotting visualization. Spark and R like packages to handle data frames make it very practical. The hands on examples and sample datasets provided help a lot. Feels like Python is for anything and everything.
« previous 1 3 4 next »
There are no discussion topics on this book yet. Be the first to start one »

Readers also enjoyed

  • Hands-On Machine Learning with Scikit-Learn and TensorFlow
  • Deep Learning with Python
  • Python Data Science Handbook: Tools and Techniques for Developers
  • Automate the Boring Stuff with Python: Practical Programming for Total Beginners
  • Learning Python
  • An Introduction to Statistical Learning: With Applications in R
  • Fluent Python: Clear, Concise, and Effective Programming
  • Data Science from Scratch: First Principles with Python
  • Python Crash Course: A Hands-On, Project-Based Introduction to Programming
  • R for Data Science: Import, Tidy, Transform, Visualize, and Model Data
  • Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow: Concepts, Tools, and Techniques to Build Intelligent Systems
  • The Elements of Statistical Learning: Data Mining, Inference, and Prediction
  • Think Python
  • Introduction to Machine Learning with Python: A Guide for Data Scientists
  • Introduction to Algorithms
  • Practical Statistics for Data Scientists: 50 Essential Concepts
  • The Hundred-Page Machine Learning Book
  • Python Tricks: A Buffet of Awesome Python Features
See similar books…

Goodreads is hiring!

If you like books and love to build cool products, we may be looking for you.
Learn more »

News & Interviews

Some interesting news for book nerds: According to recent industry research, book sales spiked dramatically in 2020–otherwise a rather...
7 likes · 1 comments
“Act without doing; work without effort. Think of the small as large and the few as many. Confront the difficult while it is still easy; accomplish the great task by a series of small acts. — Laozi” 0 likes
More quotes…