The Machine Learning Paradox
Mike Loukides describes a fundamental weirdness in creating predictive algorithms: in order to make them flexible enough to deal with real-world data, you also have to make them imperfect.
Building a system that’s 100% accurate on training
data is a problem that’s well known to data scientists:
it’s called overfitting. It’s an easy and tempting
mistake to make, regardless of the technology you’re
using. Give me any set of points (stock market prices,
daily rainfall, whatever; I don’t care what they represent),
and I can find an equation that will pass through them
all. Does that equation say anything at all about the
next point you give me? Does it tell me how to invest
or what raingear to buy? No���all my equation has done
is "memorize" the sample data. Data only
has predictive value if the match between the predictor
and the data isn’t perfect. You’ll be much better off
getting out a ruler and eyeballing the straight line
that comes closest to fitting.
If a usable machine learning system can’t identify
the training data perfectly, what does that say about
its performance on real-world data? It’s also going
to be imperfect. How imperfect? That depends on the
application. 90–95% accuracy is achievable in many
applications, maybe even 99%, but never 100%. ���
The right question to ask isn’t how to make an error-free system; it’s how much error you’re willing to tolerate, and how much you’re willing to pay to reduce errors to that level.
If errors are inevitable, then the job of design is to present the data in ways that set appropriate expectations. The more I ponder the future of UX in a machine-learning world, the more I’m convinced of this: large swaths of the UX discipline will revolve around presenting data in ways that anticipate the machines’ occasionally odd, strange, and just-plain-wrong pronouncements.
O'Reilly Media | The Machine Learning Paradox