How do you take your data analysis skills beyond Excel to the next level? By learning just enough Python to get stuff done. This hands-on guide shows non-programmers like you how to process information thatâ??s initially too messy or difficult to access. You don't need to know a thing about the Python programming language to get started. Through various step-by-step exercises, youâ??ll learn how to acquire, clean, analyze, and present data efficiently. Youâ??ll also discover how to automate your data process, schedule file- editing and clean-up tasks, process larger datasets, and create compelling stories with data you obtain.
From the introduction, the target audience for this book is not obvious beyond non-Python experts. The description of data wrangling seems to broadly relate to data analytics: “taking a messy or unrefined source of data and turning it into something useful”.
The book perhaps does not sufficiently highlight the first, most difficult stage of analytics in establishing the business case / problem identification, referred in the book to as “formulating a question”. This supports the common over-simplification of data analytics, as the fundamental problem with obtaining value is determining the business case (if any).
Python is excellent for encoding algorithms for cleansing and analysis, etc. but not all of data analytics in its entirety (such as establishing the business case) can be achieved by implementation through Python alone, implementation comes later and is by comparison, much easier.
Going beyond this important point and the acknowledged, but a little peculiar, relationship to journalism, the book is well written and comprehensive. Every topic is not covered, although most are touched upon. Beginning with advice on topics often neglected but necessary, like installation, the book has helpful chapters on data and file types (as expected), and the chapter on PDFs is particularly useful and insightful. Advanced topics include some details on parallel processing.
The book also provides examples and online support/forums. The book attempts to explain very difficult concepts and is as stated, aimed at non-Python experts however, a good background in computer science is essential if the reader is to get the most from reading this book. It’s excellent overall.
Recommended as a good supportive text for data wrangling (analytics) for computer scientists who are not experts on Python.
As someone who’s been working with Python and data wrangling professionally for over a decade, I can confidently say this book is useless. It rehashes the pandas documentation with toy CSV files that teach nothing about real-world problems. If you actually want to learn data wrangling, spend an afternoon with free tutorials. You’ll get far more value than from this mess.
The title is misleading. There are no real “tools” here, just toy CSV examples and outdated methods. The book avoids every hard problem that actual data wrangling requires, like dealing with dirty text, missing data, or large files. It’s shallow and frustrating from start to finish.
This feels like a first draft that should never have been published. The content is scattered, the explanations are lazy, and the examples are insultingly simple. Nothing in this book will help you in real projects—it’s just a poorly written rehash of free material you can find online.