Jump to ratings and reviews
Rate this book

Practical Data Analysis - Second Edition

Rate this book
Beyond buzzwords like Big Data or Data Science, there are a great opportunities to innovate in many businesses using data analysis to get data-driven products. Data analysis involves asking many questions about data in order to discover insights and generate value for a product or a service. This book explains the basic data algorithms without the theoretical jargon, and you’ll get hands-on turning data into insights using machine learning techniques. We will perform data-driven innovation processing for several types of data such as text, Images, social network graphs, documents, and time series, showing you how to implement large data processing with MongoDB and Apache Spark. Hector Cuesta is founder and Chief Data Scientist at Dataxios, a machine intelligence research company. Holds a BA in Informatics and a M.Sc. in Computer Science. He provides consulting services for data-driven product design with experience in a variety of industries including financial services, retail, fintech, e-learning and Human Resources. He is an enthusiast of Robotics in his spare time. You can follow him on Twitter at Dr. Sampath Kumar works as an assistant professor and head of Department of Applied Statistics at Telangana University. He has completed M.Sc., M.Phl., and Ph. D. in statistics. He has five years of teaching experience for PG course. He has more than four years of experience in the corporate sector. His expertise is in statistical data analysis using SPSS, SAS, R, Minitab, MATLAB, and so on. He is an advanced programmer in SAS and matlab software. He has teaching experience in different, applied and pure statistics subjects such as forecasting models, applied regression analysis, multivariate data analysis, operations research, and so on for M.Sc. students. He is currently supervising Ph.D. scholars.

338 pages, Paperback

First published January 1, 2013

7 people are currently reading
61 people want to read

About the author

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
2 (7%)
4 stars
10 (37%)
3 stars
13 (48%)
2 stars
2 (7%)
1 star
0 (0%)
Displaying 1 - 6 of 6 reviews
Profile Image for Rob.
Author 2 books440 followers
December 9, 2013
I just finished up reading Practical Data Analysis by Hector Cuesta (Packt Publishing, 2013) and overall, it was a pretty good overview and recommends some good tools. I would say that the book is a good place for someone to get started if they have no real experience performing these kinds of analyses, and though Cuesta doesn't go deep into the math behind it all, he isn't afraid to use the technical names for different formulae, which should make it easy for you to do your own follow-up research. [1]

Jeff Leek's Data Analysis on Coursera provides the lens through which I read this book. [2] That being said, I found myself doing a lot of comparing and contrasting between the two. For example, they both use practical, reasonably small "real world" sample problems to highlight specific analytical techniques and/or features of their chosen toolkits. However, whereas Leek's course focused exclusively on using R, Cuesta assembles his own all-star team of tools using Python [3] and D3.js. Perhaps it goes without saying, but there are pros and cons to each approach (e.g., Leek's "pure R" vs. Cuesta's "Python plus D3.js"), and I felt that it was best to consider them together.

Cuesta's approach with this book is to present a sample scenario in each chapter that introduces a class of problem, a solution to that problem, and his recommended toolkit. For example, chapter six creates a stock price simulation, introducing simple simulation problems (especially for apparently stochastic data), time series data and Monte Carlo methods, and then how to simulate the data using Python and visualizing it in D3.js. Although the book is not strictly a "cookbook", the chapters very much feel like macro-level "recipes". There's quite a bit of code and some decent discussion around the concepts that govern the analytical model, and (true to the "practical" in the title) the emphasis is on the "how" and not the "why".

While I did not read the entire book cover-to-cover, I would definitely recommend it to anyone that wants an introduction to some basic data analysis techniques and tools. You'll get more out of this book if you have some base to compare it to -- e.g., some experience in R (academic or otherwise); and you'll get the most out of this book if you also have a solid foundation in the mathematics and/or statistics that underlie these analytical approaches.

Disclosure: I received an electronic copy of this book from the publisher in exchange for writing this review.

------

[1] As an aside, this seems to be par for the course for the “technical” data analysis books, blog posts, and MOOCs that I’ve encountered. That is to say, “the math” is touched on, but if you don’t already have a background in linear algebra (or whatever) then you’re going to wind up taking it on faith that support vector machines do what you need them to do.

[2] I wrote about my experience in Jeff Leek’s class in April of 2013. (See: “reflecting on Data Analysis”.)

[3] Both the Python standard library and a collection of libraries like mlpy and matplotlib.
Profile Image for Al-ahmadgaid Asaad.
2 reviews
November 29, 2013
Practical Data Analysis is all about applications of statistical methodologies in computer science. I find it very useful since this was not taught in my statistics class. In college, we only practice statistics on fields like sociology, psychology, agriculture, economics, chemistry, biology, industrial engineering, and many others, but we were not onto computer science, we only deal with it when coding in R or SAS. Hal Varian once said in that,



. . . we've got at least hundred statisticians on Google . . .



And I was curious about that, I mean, what are they doing on Google? What are the statistical tools do they use? Thanks to this book, Hector Cuesta utilized Dynamic Time Warping (DTW) for illustrating the image similarity search which is used by Google for searching images, by using time series for comparing the distance between the photo pixels; another is classifying spam from not spam emails based on the subject line of the messages, where he demonstrates the application of Naïve Bayes algorithm for text classification (isn't that cool?); he also talk about Kernel Ridge Regression for predicting gold price using time series; the Principal Component Analysis (PCA) and Multidimensional Scaling (MDS) for dimensionality reduction; and then on the later chapters, it's all about "Hacking" just as what John D. Cook described on his review. Hacking data from social networking sites like facebook and twitter, how to visualize these using Gephi and make an analysis about it.



Further, I also learned new visualization tool, that is the D3.js, I am loving it now. The brief introduction on every topic is well-explained, without needing to google it for more readings; and the step-by-step procedure for programming is easy to follow.


The Programming Language
In general, the programming language used in this book is Python and JavaScript, but mostly Python. So it is advantageous if you have a general understanding on these languages.



Issues
Some of the issues I found are the unconsistency of the file name between the Github repository and the book itself, it gets you confuse, like the pokemonByType.csv in Github, is named as sumPokemon.csv in the book; in Chapter 2, working with OpenRefine, the column names of the Excel data in Github are in different language (I think spanish), while in the book it's in English; another is with the code, the D3.js charts in Chapter 3, such as the bar and pie charts did not work on my machine, I am new to D3.js and so I was not able to fix it immediately, but despite that, I got a quick response after sending an issue to the author. He even said, if I can help you in anything else don't hesitate to ask. So there is nothing worry, it is a minor issue, just to caution you.



Conclusion
Overall, I recommend this book, it is worth reading.


Link to the book here: http://bit.ly/1co6hOZ
2 reviews
November 25, 2013
This book http://bit.ly/1co6hOZ gives a very practical introduction to data analysis. It covers a wide range of topics, including data visualization, text analysis (spam recognition, sentiment analysis), image analysis, social graph analysis, Bayes classification, SVM, etc. The examples are very practical, and teaches the user how to use popular languages and libraries like d3.js, python3, nltk, mlpy etc. to do basic data analysis.

The book is a great read for beginners. To read and fully appreciate it, no data analysis is required. The books provide an introduction to the very basic techniques. Some basic understanding of python and javascript would be necessary, though.

What I like of this book is its hand-on style: while reading, you can easily get started with your first data analyses. The examples are very simple, the code easy to read, and a very detailed appendix helps to install the tools used. This book is a great help to learn data analysis by doing.

What may be improved is precision. I found some grammar mistakes. Not so big a problem, but not perfect, either. For instance reading sentences like "we will use Pillow due to its compatibility with Python 3.2 and can be downloaded ..." [p. 97] does hurt a little. More problematic is the section "Classifier accuracy" [p. 90]. It simply uses the ratio of correctly predicted emails to be a measure of accuracy, although actually every discussion of classification accuracy must contain the rations of false positives and false negatives as well.

Overall, this book is a very practical introduction to data analysis. I can recommend it to beginners of this area.
600 reviews11 followers
March 11, 2014
A good book when you want an overview on Data Analysis without having prior experience. It doesn’t go deep into the topic what I think is the biggest problem with this book. If you don’t know Python it will be hard to follow and you will miss out on the examples. Despite this the topic is really interesting and you will know where to look for more information.
Profile Image for Michael.
24 reviews
June 22, 2014
liked it. not a lot of depth but lots of starter and environment validation.
Displaying 1 - 6 of 6 reviews

Can't find what you're looking for?

Get help and learn more about the design.