The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World
Rate it:
1%
Flag icon
Civilization advances by extending the number of important operations we can perform without thinking about them. —Alfred North Whitehead
3%
Flag icon
The main ones are five in number, and we’ll devote a chapter to each. Symbolists view learning as the inverse of deduction and take ideas from philosophy, psychology, and logic. Connectionists reverse engineer the brain and are inspired by neuroscience and physics. Evolutionaries simulate evolution on the computer and draw on genetics and evolutionary biology. Bayesians believe learning is a form of probabilistic inference and have their roots in statistics. Analogizers learn by extrapolating from similarity judgments and are influenced by psychology and mathematical optimization.
3%
Flag icon
Each of the five tribes of machine learning has its own master algorithm, a general-purpose learner that you can in principle use to discover knowledge from data in any domain. The symbolists’ master algorithm is inverse deduction, the connectionists’ is backpropagation, the evolutionaries’ is genetic programming, the Bayesians’ is Bayesian inference, and the analogizers’ is the support vector machine.
4%
Flag icon
People often think computers are all about numbers, but they’re not. Computers are all about logic. Numbers and arithmetic are made of logic, and so is everything else in a computer. Want to add two numbers? There’s a combination of transistors that does that. Want to beat the human Jeopardy! champion? There’s a combination of transistors for that too (much bigger, naturally).
6%
Flag icon
Every algorithm has an input and an output: the data goes into the computer, the algorithm does what it will with it, and out comes the result. Machine learning turns this around: in goes the data and the desired result and out comes the algorithm that turns one into the other. Learning algorithms—also known as learners—are algorithms that make other algorithms. With machine learning, computers write their own programs, so we don’t have to.
6%
Flag icon
Learning algorithms are the seeds, data is the soil, and the learned programs are the grown plants. The machine-learning expert is like a farmer, sowing the seeds, irrigating and fertilizing the soil, and keeping an eye on the health of the crop but otherwise staying out of the way.
6%
Flag icon
Machine learning takes many different forms and goes by many different names: pattern recognition, statistical modeling, data mining, knowledge discovery, predictive analytics, data science, adaptive systems, self-organizing systems, and more. Each of these is used by different communities and has different associations. Some have a long half-life, some less so. In this book I use the term machine learning to refer broadly to all of them.
6%
Flag icon
Technically, machine learning is a subfield of AI, but it’s grown so large and successful that it now eclipses its proud parent.
7%
Flag icon
The Industrial Revolution automated manual work and the Information Revolution did the same for mental work, but machine learning automates automation itself. Without it, programmers become the bottleneck holding up progress.
7%
Flag icon
when you can download any book to your e-reader any time, the problem becomes the overwhelming number of choices.
7%
Flag icon
This is the defining problem of the Information Age, and machine learning is a big part of the solution.
8%
Flag icon
The best way for a company to ensure that learners like its products is to run them itself. Whoever has the best algorithms and the most data wins. A new type of network effect takes hold: whoever has the most customers accumulates the most data, learns the best models, wins the most new customers, and so on in a virtuous circle (or a vicious one, if you’re the competition). Switching from Google to Bing may be easier than switching from Windows to Mac, but in practice you don’t because Google, with its head start and larger market share, knows better what you want, even if Bing’s technology ...more
8%
Flag icon
and Google’s search results still leave a lot to be desired. Every feature of a product, every corner of a website can potentially be improved using machine learning. Should the link at the bottom of a page be red or blue? Try them both and see which one gets the most clicks. Better still, keep the learners running and continuously adjust all aspects of the website.
8%
Flag icon
The same dynamic happens in any market where there’s lots of choice and lots of data. The race is on, and whoever learns fastest wins. It doesn’t stop with understanding customers better: companies can apply machine learning to every aspect of their operations, provided data is available, and data is pouring in f...
This highlight has been truncated due to consecutive passage length restrictions.
8%
Flag icon
Businesses look at data as a strategic asset: What data do I have that my competitors don’t? How can I take advantage of it? What data do my competitors have that I don’t?
8%
Flag icon
The good news today is that sciences that were once data-poor are now data-rich. Instead of paying fifty bleary-eyed undergraduates to perform some task in the lab, psychologists can get as many subjects as they want by posting the task on Amazon’s Mechanical Turk. (It makes for a more diverse sample too.)
9%
Flag icon
Unfortunately, most phenomena in the world are nonlinear. (Or fortunately, since otherwise life would be very boring—in fact, there would be no life.) Machine learning opens up a vast new world of nonlinear models. It’s like turning on the lights in a room where only a sliver of moonlight filtered before.
10%
Flag icon
There’s a further twist: once a learned program is deployed, the bad guys change their behavior to defeat it. This contrasts with the natural world, which always works the same way. The solution is to marry machine learning with game theory, something I’ve worked on in the past: don’t just learn to defeat what your opponent does now; learn to parry what he might do against your learner.
10%
Flag icon
Machine learning is like having a radar that sees into the future. Don’t just react to your adversary’s moves; predict them and preempt them.
11%
Flag icon
Naïve Bayes can learn to diagnose the condition in a fraction of a second, often better than doctors who spent many years in medical school. It can also beat medical expert systems that took thousands of person-hours to build. The same algorithm is widely used to learn spam filters, a problem that at first sight has nothing to do with medical diagnosis. Another simple learner, called the nearest-neighbor algorithm, has been used for everything from handwriting recognition to controlling robot hands to recommending books and movies you might like. And decision tree learners are equally apt at ...more
12%
Flag icon
Here, then, is the central hypothesis of this book: All knowledge—past, present, and future—can be derived from data by a single, universal learning algorithm. I call this learner the Master Algorithm. If such an algorithm is possible, inventing it would be one of the greatest scientific achievements of all time.
12%
Flag icon
All information in the brain is represented in the same way, via the electrical firing patterns of neurons. The learning mechanism is also the same: memories are formed by strengthening the connections between neurons that fire together, using a biochemical process known as long-term potentiation.
14%
Flag icon
A
14%
Flag icon
problem is in P if we can solve it efficiently, and it’s in NP if we can efficiently check its solution.
15%
Flag icon
Minsky was an ardent supporter of the Cyc project, the most notorious failure in the history of AI. The goal of Cyc was to solve AI by entering into a computer all the necessary knowledge. When the project began in the 1980s, its leader, Doug Lenat, confidently predicted success within a decade. Thirty years later, Cyc continues to grow without end in sight, and commonsense reasoning still eludes it.
16%
Flag icon
Statistical parsers analyze language with accuracy close to that of humans, where hand-coded ones lagged far behind.
17%
Flag icon
In a famous passage of his book The Sciences of the Artificial, AI pioneer and Nobel laureate Herbert Simon asked us to consider an ant laboriously making its way home across a beach. The ant’s path is complex, not because the ant itself is complex but because the environment is full of dunelets to climb and pebbles to get around. If we tried to model the ant by programming in every possible path, we’d be doomed. Similarly, in machine learning the complexity is in the data; all the Master Algorithm has to do is assimilate it, so we shouldn’t be surprised if it turns out to be simple.
18%
Flag icon
The computer is your tool, not your adversary. Armed with machine learning, a manager becomes a supermanager, a scientist a superscientist, an engineer a superengineer. The future belongs to those who understand at a very deep level how to combine their unique expertise with what algorithms do best.
18%
Flag icon
Control of data and ownership of the models learned from it is what many of the twenty-first century’s battles will be about—between governments, corporations, unions, and individuals. But you also have an ethical duty to share data for the common good. Machine learning alone will not cure cancer; cancer patients will, by sharing their data for the benefit of future patients.
20%
Flag icon
The main ones are the symbolists, connectionists, evolutionaries, Bayesians, and analogizers. Each tribe has a set of core beliefs, and a particular problem that it cares most about. It has found a solution to that problem, based on ideas from its allied fields of science, and it has a master algorithm that embodies it.
20%
Flag icon
For connectionists, learning is what the brain does, and so what we need to do is reverse engineer it. The brain learns by adjusting the strengths of connections between neurons, and the crucial problem is figuring out which connections are to blame for which errors and changing them accordingly. The connectionists’ master algorithm is backpropagation, which compares a system’s output with the desired one and then successively changes the connections in layer after layer of neurons so as to bring the output closer to what it should be.
21%
Flag icon
The key problem is judging how similar two things are. The analogizers’ master algorithm is the support vector machine, which figures out which experiences to remember and how to combine them to make new predictions.
21%
Flag icon
But the true Master Algorithm must solve all five problems, not just one.
21%
Flag icon
one. In computer science, theorists and knowledge engineers are rationalists; hackers and machine learners are empiricists.
22%
Flag icon
David Hume was the greatest of the empiricists and the greatest English-speaking philosopher of all time. Thinkers like Adam Smith and Charles Darwin count him among their key influences. You could also say he’s the patron saint of the symbolists.
22%
Flag icon
This leads us to the most important problem in machine learning: overfitting, or hallucinating patterns that aren’t really there. We’ll see how the symbolists solve it, and how machine learning is at heart a kind of alchemy, transmuting data into knowledge with the aid of a philosopher’s stone.
23%
Flag icon
You may have a database of a trillion past e-mails, each already labeled as spam or not, but that won’t save you, since the chances that every new e-mail will be an exact copy of a previous one are just about zero. You have no choice but to try to figure out at a more general level what distinguishes spam from nonspam. And, according to Hume, there’s no way to do that.
24%
Flag icon
Machine learning is a kind of knowledge pump: we can use it to extract a lot of knowledge from data, but first we have to prime the pump.
24%
Flag icon
Machine learning is what mathematicians call an ill-posed problem: it doesn’t have a unique solution.
24%
Flag icon
Our goal is to figure out the simplest program we can write such that it will continue to write itself by reading data, without limit, until it knows everything there is to know.
24%
Flag icon
That’s the question machine learners have to ask themselves every day when they go to work: Do I feel lucky today? Just like evolution, machine learning doesn’t get it right every time; in fact, errors are the rule, not the exception. But it’s OK, because we discard the misses and build on the hits, and the cumulative result is what matters. Once we acquire a new piece of knowledge, it becomes a basis for inducing yet more knowledge. The only question is where to begin.
24%
Flag icon
Newton’s principle is the first unwritten rule of machine learning. We induce the most widely applicable rules we can and reduce their scope only when the data forces us to.
25%
Flag icon
ones. For example: “A couple is a good match only if they’re both outgoing, he’s a dog person, and she’s not a cat person.” You can now throw away the data and keep only this definition, since it encapsulates all that’s relevant for your purposes. This algorithm is guaranteed to finish in a reasonable amount of time, and it’s also the first actual learner we meet in this book!
25%
Flag icon
scientist. Michalski’s hometown of Kalusz was successively part of Poland, Russia, Germany, and Ukraine, which may have left him more attuned than most to disjunctive concepts. After immigrating to the United States in 1970, he went on to found the symbolist school of machine learning, along with Tom Mitchell and Jaime Carbonell.
26%
Flag icon
The “beer and diapers” rule has acquired legendary status among data miners (although some claim the legend is of the urban variety). Either way, it’s a long way from the digital circuit design problems Michalski had in mind when he first started thinking about rule induction in the 1960s.
26%
Flag icon
Learning is forgetting the details as much as it is remembering the important parts. Computers are the ultimate idiot savants: they can remember everything with no trouble at all, but that’s not what we want them to do.
26%
Flag icon
Whenever a learner finds a pattern in the data that is not actually true in the real world, we say that it has overfit the data. Overfitting is the central problem in machine learning.
26%
Flag icon
Consider the little white girl who, upon seeing a Latina baby at the mall, blurted out “Look, Mom, a baby maid!” (True event.) It’s not that she’s a natural-born bigot. Rather, she overgeneralized from the few Latina maids she has seen in her short life. The world is full of Latinas with other occupations, but she hasn’t met them yet. Our beliefs are based on our experience, which gives us a very incomplete picture of the world, and it’s easy to jump to false conclusions.
26%
Flag icon
In machine learning, the computer’s greatest strength—its ability to process vast amounts of data and endlessly repeat the same steps without tiring—is also its Achilles’ heel.
27%
Flag icon
Overfitting is seriously exacerbated by noise. Noise in machine learning just means errors in the data, or random events that you can’t predict.
« Prev 1 3