Jump to ratings and reviews
Rate this book

Mastering Python for Data Science

Rate this book
Explore the world of data science through Python and learn how to make sense of data

About This BookMaster data science methods using Python and its librariesCreate data visualizations and mine for patternsAdvanced techniques for the four fundamentals of Data Science with Python - data mining, data analysis, data visualization, and machine learningWho This Book Is ForIf you are a Python developer who wants to master the world of data science then this book is for you. Some knowledge of data science is assumed.

What You Will LearnManage data and perform linear algebra in PythonDerive inferences from the analysis by performing inferential statisticsSolve data science problems in PythonCreate high-end visualizations using PythonEvaluate and apply the linear regression technique to estimate the relationships among variables.Build recommendation engines with the various collaborative filtering algorithmsApply the ensemble methods to improve your predictionsWork with big data technologies to handle data at scaleIn DetailData science is a relatively new knowledge domain which is used by various organizations to make data driven decisions. Data scientists have to wear various hats to work with data and to derive value from it. The Python programming language, beyond having conquered the scientific community in the last decade, is now an indispensable tool for the data science practitioner and a must-know tool for every aspiring data scientist. Using Python will offer you a fast, reliable, cross-platform, and mature environment for data analysis, machine learning, and algorithmic problem solving.

This comprehensive guide helps you move beyond the hype and transcend the theory by providing you with a hands-on, advanced study of data science.

Beginning with the essentials of Python in data science, you will learn to manage data and perform linear algebra in Python. You will move on to deriving inferences from the analysis by performing inferential statistics, and mining data to reveal hidden patterns and trends. You will use the matplot library to create high-end visualizations in Python and uncover the fundamentals of machine learning. Next, you will apply the linear regression technique and also learn to apply the logistic regression technique to your applications, before creating recommendation engines with various collaborative filtering algorithms and improving your predictions by applying the ensemble methods.

Finally, you will perform K-means clustering, along with an analysis of unstructured data with different text mining techniques and leveraging the power of Python in big data analytics.

Style and approachThis book is an easy-to-follow, comprehensive guide on data science using Python. The topics covered in the book can all be used in real world scenarios.

296 pages, Kindle Edition

First published August 31, 2015

6 people are currently reading
38 people want to read

About the author

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
2 (15%)
4 stars
8 (61%)
3 stars
1 (7%)
2 stars
1 (7%)
1 star
1 (7%)
Displaying 1 - 3 of 3 reviews
Profile Image for Tadas Talaikis.
Author 7 books79 followers
July 1, 2017
More like pandas and scipy than Python*, but ok.

Meaning, I'm searching fro more in-depth books, but probably that can be achieved only by looking into scipy/ pandas code more directly :)
Profile Image for BCS.
218 reviews33 followers
February 10, 2016
Samir Madhaven’s book sets out to show how the Python programming language may be used to apply statistical methods appropriate to data science. It introduces statistical techniques such as the application of probability distributions, tests of significance, linear regression, logistic regression and segmentation as well as methods for data visualisation.

Python’s NumPy and pandas libraries form the basis for coding these techniques. The Matplotlib library and statsmodels module are also introduced. The ‘Who this book is for’ section suggests that the book is aimed at Python developers who already have some knowledge of data science.

The book starts with an introduction to the data structures provided by NumPy and pandas and a useful description of methods used to import data from various sources (text files, spreadsheets, etc.) into these structures.

Most subsequent sections of the book start with a brief introduction to the statistics to be employed, followed by code examples to show how Python and its various libraries may be used to apply the methods under discussion to some sample data.

In the main, the descriptions of statistical concepts and techniques are rather vague, muddled, and often tautological. We are told, for example, that ‘Logistic regression uses logistics.’, that ‘... find[ing] the probability density of the predicted values ... helps us to understand which areas of the predicted probability are denser.’, and that ‘... correlation defines the similarity between two random variables.’

Some of the examples given to illustrate statistical concepts are little better: the example given for the occurrence of the normal distribution in nature is that of the shape formed by sand collecting in an egg timer. Whilst the cross section of such a pile of sand may resemble the shape of the normal distribution as it is commonly plotted, there are many examples of the normal distribution in nature that would be more interesting and far more illustrative of the concept.

The text does illustrate how to apply libraries such as panadas and Matplotlib to problems in statistics and data science, but in this respect it provides little information or example beyond that to be found in the online tutorials that accompany such libraries.

So, who is this book for?

If the reader does not already have a good grasp of the statistics that underpin data science, he or she is unlikely to acquire it by reading this book: the material on statistics is just too thin.

If the reader knows data science but is not already familiar with Python then they would struggle to get started, as there is no introduction to the use of the core Python language.

For established Python developers, who already have an understanding of statistics and data science – the audience identified in the book itself – the book provides little more than is to be learned from the online materials that accompany pandas, Matplotlib and NumPy.

It really is hard to recommend this book to anyone.

Review by Stewart Marshall MBCS
Originally posted http://www.bcs.org/content/conWebDoc/...
Profile Image for Marcus Österberg.
Author 9 books15 followers
December 20, 2016
This is it, I've had it with Packt books. This is the third in a row regarding Python that don't ever answer the question "why", even briefly, they just go on and on about different code samples without giving any perspective.

This book is just a printed edition of online poorly written code samples aimed at those that already pretty much know what they're doing. The same people probably already have some of their own code to glance at when in doubt. Speaking about doubt, I doubt anyone needs this book.
Displaying 1 - 3 of 3 reviews

Can't find what you're looking for?

Get help and learn more about the design.