Jump to ratings and reviews
Rate this book

Data Preparation for Data Mining

Rate this book
"Data Preparation for Data Mining" addresses an issue unfortunately ignored by most authorities on data mining: data preparation. Thanks largely to its perceived difficulty, data preparation has traditionally taken a backseat to the more alluring question of how best to extract meaningful knowledge. But without adequate preparation of your data, the return on the resources invested in mining is certain to be disappointing.
Dorian Pyle corrects this imbalance. A twenty-five-year veteran of what has become the data mining industry, Pyle shares his own successful data preparation methodology, offering both a conceptual overview for managers and complete technical details for IT professionals. Apply his techniques and watch your mining efforts pay off-in the form of improved performance, reduced distortion, and more valuable results.
On the enclosed CD-ROM, you'll find a suite of programs as C source code and compiled into a command-line-driven toolkit. This code illustrates how the author's techniques can be applied to arrive at an automated preparation solution that works for you. Also included are demonstration versions of three commercial products that help with data preparation, along with sample data with which you can practice and experiment.
* Offers in-depth coverage of an essential but largely ignored subject.
* Goes far beyond theory, leading you-step by step-through the author's own data preparation techniques.
* Provides practical illustrations of the author's methodology using realistic sample data sets.
* Includes algorithms you can apply directly to your own project, along with instructions for understanding when automation is possible and when greater intervention is required.
* Explains how to identify and correct data problems that may be present in your application.
* Prepares miners, helping them head into preparation with a better understanding of data sets and their limitations.

560 pages, Paperback

First published March 15, 1999

2 people are currently reading
41 people want to read

About the author

Dorian Pyle

4 books8 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
7 (23%)
4 stars
9 (30%)
3 stars
12 (40%)
2 stars
1 (3%)
1 star
1 (3%)
Displaying 1 - 4 of 4 reviews
Profile Image for Terran M.
78 reviews106 followers
July 1, 2018
A surprisingly relevant book from 20 years ago. Pyle has interesting things to say about assessing the adequacy of data, mapping categorical variables to ordinal ones, and information-theoretic data surveys, all of which are absent in modern introductions.

Sadly, the book suffers from describing the recommended algorithms in a way that is insufficiently detailed to allow one to actually implement them - the author assumes that the reader will be using commercial software packages, but the commercial software from the book's era is long gone and several of his core algorithms are not implemented in any modern library that I am aware of.

Just as in his other book, Business Modeling and Data Mining, the author also has a style which might be described as an overly verbose style, or in any case certainly not a terse style, which the reader may find requires a certain level, but not an overly high level, of patience in order to fully read, digest, and appreciate.

Although this book was originally positioned as an introduction for novices, it should not be used that way today - it is more suitable for experts looking for a fresh perspective and ideas outside the current mainstream, who have enough experience to know which insights are forgotten gems and which are just obsolete. Overall worth reading in spite of its limitations.
120 reviews18 followers
reference-only
March 20, 2021
This books focuses on the issues of processing data prior to data mining, which is often unmentioned. This includes handling missing values, outliers, and transforming variables to improve the results. I wish I had read this prior to beginning my research.
14 reviews
October 1, 2015
This book is amazing. I learned new several techniques to apply in my own data mining/machine learning projects/experiments. It provides a very good set of theories and new techniques to improve the data that you have at hand. I recommend this book.
Profile Image for Joao Carlos.
2 reviews
December 23, 2010
Excellent. I've found the chapter 5 where Confidence Level is explained a must read for those who are not statisticians.
Displaying 1 - 4 of 4 reviews

Can't find what you're looking for?

Get help and learn more about the design.