The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World
Rate it:
2%
Flag icon
The psychologist Don Norman coined the term conceptual model to refer to the rough knowledge of a technology we need to have in order to use it effectively.
3%
Flag icon
Each of the five tribes of machine learning has its own master algorithm, a general-purpose learner that you can in principle use to discover knowledge from data in any domain. The symbolists’ master algorithm is inverse deduction, the connectionists’ is backpropagation, the evolutionaries’ is genetic programming, the Bayesians’ is Bayesian inference, and the analogizers’ is the support vector machine.
6%
Flag icon
Machine learning takes many different forms and goes by many different names: pattern recognition, statistical modeling, data mining, knowledge discovery, predictive analytics, data science, adaptive systems, self-organizing systems, and more.
8%
Flag icon
new type of network effect takes hold: whoever has the most customers accumulates the most data, learns the best models, wins the most new customers, and so on in a virtuous circle (or a vicious one, if you’re the competition).
16%
Flag icon
A related, frequently heard objection is “Data can’t replace human intuition.” In fact, it’s the other way around: human intuition can’t replace data.
16%
Flag icon
“Listen to your customers, not to the HiPPO,” HiPPO being short for “highest paid person’s opinion.” If you want to be tomorrow’s authority, ride the data, don’t fight it.
20%
Flag icon
For symbolists, all intelligence can be reduced to manipulating symbols, in the same way that a mathematician solves equations by replacing expressions by other expressions. Symbolists understand that you can’t learn from scratch: you need some initial knowledge to go with the data. They’ve figured out how to incorporate preexisting knowledge into learning, and how to combine different pieces of knowledge on the fly in order to solve new problems. Their master algorithm is inverse deduction, which figures out what knowledge is missing in order to make a deduction go through, and then makes it ...more
This highlight has been truncated due to consecutive passage length restrictions.
26%
Flag icon
Whenever a learner finds a pattern in the data that is not actually true in the real world, we say that it has overfit the data. Overfitting is the central problem in machine learning. More papers have been written about it than about any other topic.
27%
Flag icon
Overfitting happens when you have too many hypotheses and not enough data to tell them apart.
38%
Flag icon
Nonlinear models are important far beyond the stock market. Scientists everywhere use linear regression because that’s what they know, but more often than not the phenomena they study are nonlinear, and a multilayer perceptron can model them.
41%
Flag icon
Another big issue is that humans—and symbolic models like sets of rules and decision trees—can explain their reasoning, while neural networks are big piles of numbers that no one can understand.
45%
Flag icon
No one is sure why sex is pervasive in nature, either. Several theories have been proposed, but none is widely accepted. The leader of the pack is the Red Queen hypothesis, popularized by Matt Ridley in the eponymous book. As the Red Queen said to Alice in Through the Looking Glass, “It takes all the running you can do, to keep in the same place.” In this view, organisms are in a perpetual arms race with parasites, and sex helps keep the population varied, so that no single germ can infect all of it.
46%
Flag icon
As connectionists like Geoff Hinton like to point out, there’s no advantage to carrying around in the genome information that we can readily acquire from the senses. When a newborn opens his eyes, the visual world comes flooding in; the brain just has to organize it. What does need to be specified in the genome, however, is the architecture of the machine that does the organizing.
48%
Flag icon
At heart, Bayes’ theorem is just a simple rule for updating your degree of belief in a hypothesis when you receive new evidence: if the evidence is consistent with the hypothesis, the probability of the hypothesis goes up; if not, it goes down.
49%
Flag icon
Bayes’ theorem is useful because what we usually know is the probability of the effects given the causes, but what we want to know is the probability of the causes given the effects.
50%
Flag icon
A learner that uses Bayes’ theorem and assumes the effects are independent given the cause is called a Naïve Bayes classifier.
52%
Flag icon
Bayesian networks give the lie to the common misconception that machine learning can’t predict very rare events, or “black swans,” as Nassim Taleb calls them.
53%
Flag icon
Markov chains encode the assumption that the future is conditionally independent of the past given the present. HMMs assume in addition that each observation depends only on the corresponding state. Bayesian networks are for Bayesians what logic is for symbolists: a lingua franca that allows us to elegantly encode a dizzying variety of situations and devise algorithms that work uniformly in all of them.
56%
Flag icon
A Markov network is a set of features and corresponding weights, which together define a probability distribution.
60%
Flag icon
Recommender systems, as they’re also called, are big business: a third of Amazon’s business comes from its recommendations, as does three-quarters of Netflix’s.
62%
Flag icon
To learn an SVM, we need to choose the support vectors and their weights. The similarity measure, which in SVM-land is called the kernel, is usually chosen a priori.
72%
Flag icon
Chris Watkins at Cambridge, initially motivated by his experimental observations of children’s learning, arrived at the modern formulation of reinforcement learning as optimal control in an unknown environment.
72%
Flag icon
Gaming aside, researchers have used reinforcement learning to balance poles, control stick-figure gymnasts, park cars backward, fly helicopters upside down, manage automated telephone dialogues, assign channels in cell phone networks, dispatch elevators, schedule space-shuttle cargo loading, and much else.
73%
Flag icon
the best way to understand an entity—whether it’s a person, an animal, a web page, or a molecule—is to understand how it relates to other entities. This requires a new kind of learning that doesn’t treat the data as a random sample of unrelated objects but as a glimpse into a complex network.
80%
Flag icon
Hume’s dictum that learning is only possible with prior knowledge.
84%
Flag icon
Your digital future begins with a realization: every time you interact with a computer—whether it’s your smart phone or a server thousands of miles away—you do so on two levels. The first one is getting what you want there and then: an answer to a question, a product you want to buy, a new credit card. The second level, and in the long run the most important one, is teaching the computer about you. The more you teach it, the better it can serve you—or manipulate you. Life is a game between you and the learners that surround you.
87%
Flag icon
Sergey Brin says that “we want Google to be the third half of your brain,”
88%
Flag icon
The best way to not lose your job is to automate it yourself.
89%
Flag icon
the long-term prospects of scientists are not the brightest, sadly. In the future, the only scientists may well be computer scientists, meaning computers doing science. The people formerly known as scientists (like me) will devote their lives to understanding the scientific advances made by computers.