More on this book
Community
Kindle Notes & Highlights
Read between
December 5 - December 17, 2018
You can’t control what you don’t understand, and that’s why you need to understand machine learning—as a citizen, a professional, and a human being engaged in the pursuit of happiness.
Only engineers and mechanics need to know how a car’s engine works, but every driver needs to know that turning the steering wheel changes the car’s direction and stepping on the brake brings it to a stop.
from philosophy, psychology, and logic. Connectionists reverse engineer the brain and are inspired by neuroscience and physics. Evolutionaries simulate evolution on the computer and draw on genetics and evolutionary biology. Bayesians believe learning is a form of probabilistic inference and have their roots in statistics. Analogizers learn by extrapolating from similarity judgments and are influenced by psychology and mathematical optimization.
Sounds like a perfect job for machine learning: in effect, it’s a more complicated and challenging version of the searches that Amazon and Netflix do every day, except it’s looking for the right treatment for you instead of the right book or movie.
To me nothing could have more impact than teaching computers to learn: if we could do that, we would get a leg up on every other problem.
Believe it or not, every algorithm, no matter how complex, can be reduced to just these three operations: AND, OR, and NOT.
Algorithms are an exacting standard. It’s often said that you don’t really understand something until you can express it as an algorithm. (As Richard Feynman said, “What I cannot create, I do not understand.”)
Every algorithm has an input and an output: the data goes into the computer, the algorithm does what it will with it, and out comes the result. Machine learning turns this around: in goes the data and the desired result and out comes the algorithm that turns one into the other.
And pity a new entrant into the search business, starting with zero data against engines with over a decade of learning behind them.
You might think that after a while more data is just more of the same, but that saturation point is nowhere in sight.
Unfortunately, most phenomena in the world are nonlinear. (Or fortunately, since otherwise life would be very boring—in fact, there would be no life.)
Machine learning is like having a radar that sees into the future. Don’t just react to your adversary’s moves; predict them and preempt them.
When the algorithms now in the lab make it to the front lines, Bill Gates’s remark that a breakthrough in machine learning would be worth ten Microsofts will seem conservative.
if you give the learner enough of the appropriate data, it can approximate any function arbitrarily closely—which is math-speak for learning anything. The catch is that “enough data” could be infinite.
All knowledge—past, present, and future—can be derived from data by a single, universal learning algorithm.
All of this is evidence that the brain uses the same learning algorithm throughout, with the areas dedicated to the different senses distinguished only by the different inputs they are connected to (e.g., eyes, ears, nose).
imagine. If something exists but the brain can’t learn it, we don’t know it exists.
Evolution is the ultimate example of how much a simple learning algorithm can achieve given enough data.
Its input is the experience and fate of all living creatures that ever existed. (Now that’s big data.) On the other hand, it’s been running for over three billion years on the most powerful computer on Earth: Earth itself. A computer version of it had better be faster and less data intensive than the original. Which one is the better model for the Master
In optimization, simple functions often give rise to surprisingly complex solutions.
This tendency to oversimplify stems from the limitations of the human mind, however, not from the limitations of mathematics.
Most of the brain’s hardware (or rather, wetware) is devoted to sensing and moving, and to do math we have to borrow parts of it that evolved for language. Computers have no such limitations and can easily turn big data into very complex models.
After seeing enough data, a single hypothesis dominates, or a few do.
Bayes’ theorem is a machine that turns data into knowledge.
Every conceivable problem that can be solved by logical deduction can be solved by a Turing machine.
The most determined resistance comes from machine learning’s perennial foe: knowledge engineering.
A related, frequently heard objection is “Data can’t replace human intuition.” In fact, it’s the other way around: human intuition can’t replace data.
“Listen to your customers, not to the HiPPO,” HiPPO being short for “highest paid person’s opinion.” If you want to be tomorrow’s authority, ride the data, don’t fight
If—let’s hope so—there are more Newton moments to be had, they are as likely to come from tomorrow’s learning algorithms as from tomorrow’s even more overwhelmed scientists, or at least from a combination of the two.
Wouldn’t it be nice if, instead of trying hundreds of variations of many algorithms, we just had to try hundreds of variations of a single one?
Cancers mutate as they spread through your body, and by natural selection, the mutations most resistant to the drugs you’re taking are the most likely to grow. The right drug for you may be one that works for only 5 percent of patients, or you may need a combination of drugs that has never been tried before.
Computers will do more with less help from us. They will not repeat the same mistakes over and over again, but learn with practice, like people do.
A model of you will negotiate the world on your behalf, playing elaborate games with other people’s and entities’ models. And as a result of all this, our lives will be longer, happier, and more productive.
Being aware of this is the first step to a happy life in the twenty-first century. Teach the learners, and they will serve you; but first you need to understand them.
Learners learn to achieve the goals we set them; they don’t get to change the goals.
no matter how good the learning algorithm is, it’s only as good as the data it gets.
Evolutionaries believe that the mother of all learning is natural selection. If it made us, it can make anything, and all we need to do is simulate it on the computer.
Rationalists believe that the senses deceive and that logical reasoning is the only sure path to knowledge. Empiricists believe that all reasoning is fallible and that knowledge must come from observation and experimentation.
This is the machine-learning problem: generalizing to cases that we haven’t seen before.
In ordinary life, bias is a pejorative word: preconceived notions are bad. But in machine learning, preconceived notions are indispensable; you can’t learn without them. In fact, preconceived notions are also indispensable to human cognition, but they’re hardwired into the brain, and we take them for granted. It’s biases over and beyond those that are questionable.
On the other hand, a computer is a blank slate until you program it; the active process itself has to be written into memory before anything can happen.
“All happy families are alike; each unhappy family is unhappy in its own way.”
The same is true of individuals. To be happy, you need health, love, friends, money, a job you like, and so on. Take any of these away, and misery ensues.
The “beer and diapers” rule has acquired legendary status among data miners (although some claim the legend is of the urban variety).
Every powerful learner, whether symbolist, connectionist, or any other, has to worry about hallucinating patterns. The only safe way to avoid it is to severely restrict what the learner can learn, for example by requiring that it be a short conjunctive concept.
Our beliefs are based on our experience, which gives us a very incomplete picture of the world, and it’s easy to jump to false conclusions.
Overfitting is seriously exacerbated by noise. Noise in machine learning just means errors in the data, or random events that you can’t predict.
the number of concepts is an exponential function of an exponential function of the number of attributes! In other words, machine learning is a combinatorial explosion of combinatorial explosions. Perhaps we should just give up and not waste our time on such a hopeless problem?
The more definitions it tries, the more likely one of them will match all the examples just by chance. If you do more and more runs of a thousand coin flips, eventually it becomes practically certain that at least one run will come up all heads.
Since the number of definitions the algorithm considers can be as large as a doubly exponential function of the number of attributes, at some point it’s practically guaranteed that the algorithm will find a bad hypothesis that looks good.