Jump to ratings and reviews
Rate this book

Dataset Shift in Machine Learning

Rate this book
An overview of recent efforts in the machine learning community to deal with dataset and covariate shift, which occurs when test and training inputs and outputs have different distributions. Dataset shift is a common problem in predictive modeling that occurs when the joint distribution of inputs and outputs differs between training and test stages. Covariate shift, a particular case of dataset shift, occurs when only the input distribution changes. Dataset shift is present in most practical applications, for reasons ranging from the bias introduced by experimental design to the irreproducibility of the testing conditions at training time. (An example is -email spam filtering, which may fail to recognize spam that differs in form from the spam the automatic filter has been built on.) Despite this, and despite the attention given to the apparently similar problems of semi-supervised learning and active learning, dataset shift has received relatively little attention in the machine learning community until recently. This volume offers an overview of current efforts to deal with dataset and covariate shift. The chapters offer a mathematical and philosophical introduction to the problem, place dataset shift in relationship to transfer learning, transduction, local learning, active learning, and semi-supervised learning, provide theoretical views of dataset and covariate shift (including decision theoretic and Bayesian perspectives), and present algorithms for covariate shift.

Contributors
Shai Ben-David, Steffen Bickel, Karsten Borgwardt, Michael Br�ckner, David Corfield, Amir Globerson, Arthur Gretton, Lars Kai Hansen, Matthias Hein, Jiayuan Huang, Choon Hui Teo, Takafumi Kanamori, Klaus-Robert M�ller, Sam Roweis, Neil Rubens, Tobias Scheffer, Marcel Schmittfull, Bernhard Sch�lkopf Hidetoshi Shimodaira, Alex Smola, Amos Storkey, Masashi Sugiyama

229 pages, Hardcover

First published February 13, 2009

2 people are currently reading
32 people want to read

About the author

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
3 (30%)
4 stars
2 (20%)
3 stars
4 (40%)
2 stars
1 (10%)
1 star
0 (0%)
Displaying 1 - 2 of 2 reviews
Profile Image for Alice Holloway.
3 reviews
December 27, 2024
This book is a solid introduction to data engineering, especially for those using Python to build and manage data pipelines. It covers essential concepts like data transformation, big data handling, and pipeline deployment with practical, real-world examples. The content is well-structured for beginners and professionals transitioning to data engineering. For those looking to complement their learning, platforms like Unidata (https://unidata.pro/) provide valuable resources, including datasets and tools that align with the book’s practical focus. Highly recommended for aspiring data engineers and IT professionals preparing for a career shift.
Profile Image for Xianshun Chen.
88 reviews2 followers
January 22, 2021
The book is not for me. I was reading this book for getting ideas on how to detect and handle data shift practically. A number of chapters contains some rather unnecessary jargon which can be simplified. Also it contains quite a number of typos. The contents are theoretical but fail to relate to practical data driven examples such dataset shift detection and measurement. also chapters are loosely coupled and does not give a good sequential flow. some chapters appear to be purely "philosophical"
Displaying 1 - 2 of 2 reviews

Can't find what you're looking for?

Get help and learn more about the design.