Learn how to apply powerful data analysis techniques with popular open source Python modulesAbout This BookLearn how to find, manipulate, and analyze data using PythonPerform advanced, high performance linear algebra and mathematical calculations with clean and efficient Python codeAn easy-to-follow guide with realistic examples that are frequently used in real-world data analysis projectsWho This Book Is ForThis book is for programmers, scientists, and engineers who have knowledge of the Python language and know the basics of data science. It is for those who wish to learn different data analysis methods using Python and its libraries. This book contains all the basic ingredients you need to become an expert data analyst.
What You Will LearnInstall open source Python modules on various platformsGet to know about the fundamentals of NumPy including arraysManipulate data with pandasRetrieve, process, store, and visualize dataUnderstand signal processing and time-series data analysisWork with relational and NoSQL databasesDiscover more about data modeling and machine learningGet to grips with interoperability and cloud computingIn DetailPython is a multi-paradigm programming language well suited for both object-oriented application development as well as functional design patterns. Python has become the language of choice for data scientists for data analysis, visualization, and machine learning. It will give you velocity and promote high productivity.
This book will teach novices about data analysis with Python in the broadest sense possible, covering everything from data retrieval, cleaning, manipulation, visualization, and storage to complex analysis and modeling. It focuses on a plethora of open source Python modules such as NumPy, SciPy, matplotlib, pandas, IPython, Cython, scikit-learn, and NLTK. In later chapters, the book covers topics such as data visualization, signal processing, and time-series analysis, databases, predictive analytics and machine learning. This book will turn you into an ace data analyst in no time.
Ivan Idris is the author of NumPy Beginner's Guide and NumPy Cookbook. He was born in Bulgaria from Indonesian parents. He moved to the Netherlands in the 1990s, where he graduated from high school and got a MSc in Experimental Physics.
His graduation thesis had a strong emphasis on Applied Computer Science. After graduating he worked for several companies as Java Developer, Datawarehouse Developer and QA Analyst.
His main professional interests are Business Intelligence, Big Data and Cloud Computing. Ivan Idris enjoys writing clean testable code and interesting technical articles.
"Python Data Analysis" provides us a complete Python package to manage, manipulate, and visualize data. Actually, data analysis is a complex area. However the author of the book, Ivan Idris, gives a clear explanation about how to implement any advanced algorithm into real world Python application.
The first three chapters of the book give us the basic of data analysis such as array, statistics, and linear algebra. The library used are Numpy, Scipy, and matplotlib for visualization. In the next chapter, we will get a clear description about Pandas project, one the most library for data analysis. Ivan wrote a step by step directions start from installing, querying, and basic data manipulation (statistics, aggregation, pivot, etc) using Pandas.
Before continouing to more advanced method, the book describes how to retrieve, process, and store data. Thanks to Python which has a complete library for connecting to various format e.g. CSV and Excel (Numpy and Pandas), JSON (json native package), RSS (feedparser), and HTML (BeautifulSoup). Data analysis using signal processing and time series representation can be found in Chapter 7. We will be introduced with statsmodel library that provides moving average technique, window function, dataset used for experiments. The autoregressive, ARMA, Fourier, and Spectral analysis are provide by Numpy and Scipy. All these methods are theoretically complex but can be simply implemented using Python.
The data are usually stored in database. The book clarify how to work with database and the use of supported package i.e. sqllite3, SQLAchemy, Pony ORM, PyMongo, and Redis. Every library is explained followed with example source code. Programmer always loves working code, not only pseudo code :). The emerging trends such as social media also discussed in this book. Data generated from social media can be analysed for opinion mining or sentiment analysis. It is for automating evaluation of opinion or expression in social media. The library used is Natural Language ToolKit (NLTK). In Chapter 10, we will get a detailed description about Scikit-Learn library and its example for machine learning for data analysis.
In conclusion, this book provides us a data analysis technique using Python comprehensively. I just can say "two thumbs up!" for the book!
Mr Idris is straightforward about the prerequisites though these are lighter than I expected. If I may extract some text directly from the book;
"This book is for people with basic knowledge of Python and Mathematics who want to learn how to use Python software to analyze data. We try to keep things simple, but it's not possible to cover all the topics in great detail. It may be useful for you to refresh your knowledge of Mathematics via Khan Academy, Coursera, or Wikipedia."
With this in your back pocket the author delivers rigorous treatment of the components of data anaylsis, spending the first hundred pages on just matrix algebra and statistics, for instance.
The extent to which Idris refers to and recommends other resources throughout the text makes it easier for the reader to extend their capabilities in any direction. In my mind, this increases the value of this book as an entry point to the topic.
Excelente libro sobre la analitica de datos con Python. Si quieres aprender a usar Numpy, Scipy, y matplotlib este libro es una buenisima introducción.
This book is shoddy in purpose just like many Packt Publishing books that I happened to read.
I jumped to Chapters 6 and 7 and found that it serves no purpose at all. For instance, see if you can derive any lesson or understanding about what Auto-correlation means in a time series data.
You can skimp through the entire book in half-hour at best.
One of Packt books that should never be published. A wide range of subject described selectively and incoherently, without deep understanding of topics. Book lacks of practical examples. A lot of packages and their relations mentioned, but no proper structure or hierarchy presented.
I think is book introduce a lot of commonly used skills for data analysis. Of course, it is not very throughly, but it is still a good starting points for a newbie python data analyzer
A rough guide. It doesn't break itself away from the rest of the "data analysis" herd. But I suppose hand books are all alike, load example data, demo some numpy and matplotlib. The formating in epub is not optimal, especially hard for python code.