Jump to ratings and reviews
Rate this book

Data Preparation for Machine Learning: Data Cleaning, Feature Selection, and Data Transforms in Python

Rate this book
Data preparation involves transforming raw data in to a form that can be modeled using machine learning algorithms. Cut through the equations, Greek letters, and confusion, and discover the specialized data preparation techniques that you need to know to get the most out of your data on your next project. Using clear explanations, standard Python libraries, and step-by-step tutorial lessons, you will discover how to confidently and effectively prepare your data for predictive modeling with machine learning.

398 pages, ebook

Published January 1, 2020

7 people are currently reading
60 people want to read

About the author

Jason Brownlee

47 books77 followers
Jason Brownlee, Ph.D. trained and worked as a research scientist and software engineer for many years (e.g. enterprise, R&D, and scientific computing), and is known online for his work on Computational Intelligence (e.g. Clever Algorithms), Machine Learning and Deep Learning (e.g. Machine Learning Mastery, sold in 2021) and Python Concurrency (e.g. Super Fast Python).

Jason writes fiction under the pseudonym J.D. Brownlee: https://www.goodreads.com/jdbrownlee

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
8 (47%)
4 stars
6 (35%)
3 stars
2 (11%)
2 stars
1 (5%)
1 star
0 (0%)
Displaying 1 of 1 review
Profile Image for Ahmed.
108 reviews18 followers
February 12, 2025
The English review is below the Arabic review.

أسلوب جيسون برونلي البسيط خلى المفاهيم سهلة الفهم، حتى لو معندكش خلفية رياضية قوية.

إيه اللي عجبني في الكتاب؟
تغطية شاملة لعمليات تحضير البيانات – الكتاب بيشرح كل حاجة من تنظيف البيانات، لاختيار الميزات، للتحويلات الإحصائية، وتقليل الأبعاد، وكل ده ضروري عشان تطلع بنموذج تعلم آلة قوي.
تبسيط المفاهيم المعقدة بشكل عملي – بيشرح بالتفصيل إزاي تتعامل مع القيم المفقودة، تكتشف القيم الشاذة، وتشوف إيه الميزات اللي فعلاً مؤثرة على النموذج.
تفادي تسرب البيانات (Data Leakage) – واحدة من الحاجات اللي ناس كتير بتغلط فيها، والكتاب وضّح إزاي تحضر البيانات بطريقة تمنع أي تسرب بين مجموعة التدريب والاختبار.

إيه اللي كان محتاج يتحسن؟
كنت أتمنى يكون في شرح أعمق شوية للجوانب النظرية، خصوصًا ليه بنختار تحويلات معينة دون غيرها.
بعض الأمثلة كانت مكررة شوية، وكمان بعض الأكواد محتاجة تحديث عشان تتماشى مع أحدث إصدارات
------------------------------------------------------------------------------
Jason Brownlee’s simple writing style made the concepts easy to understand, even if you don’t have a strong mathematical background.

What did I like about the book?
Comprehensive coverage of data preparation – The book explains everything from data cleaning and feature selection to statistical transformations and dimensionality reduction, all of which are essential for building a strong machine learning model.
Simplifying complex concepts in a practical way – It provides clear explanations on handling missing values, detecting outliers, and identifying the most important features for a model.
Avoiding data leakage – One of the most common mistakes in machine learning, and the book does a great job of showing how to prepare data in a way that prevents leakage between training and testing sets.

What could be improved?
I wish there was a deeper explanation of some theoretical aspects, especially why certain transformations are chosen over others.
Some examples felt a bit repetitive, and a few code snippets need updates to align with the latest library versions.
Displaying 1 of 1 review

Can't find what you're looking for?

Get help and learn more about the design.