The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World
Rate it:
Kindle Notes & Highlights
1%
Flag icon
The grand aim of science is to cover the greatest number of experimental facts by logical deduction from the smallest number of hypotheses or axioms. —Albert Einstein
1%
Flag icon
Civilization advances by extending the number of important operations we can perform without thinking about them. —Alfred North Whitehead
2%
Flag icon
Computers aren’t supposed to be creative; they’re supposed to do what you tell them to. If what you tell them to do is be creative, you get machine learning. A learning algorithm is like a master craftsman: every one of its productions is different and exquisitely tailored to the customer’s needs.
2%
Flag icon
the world senses what you want and changes accordingly, without you having to lift a finger.
Tim Moore
in a virtual reality
2%
Flag icon
they are limited to what we can systematically observe and tractably model. Big data and machine learning greatly expand that scope. Some everyday things can be predicted by the unaided mind, from catching a ball to carrying on a conversation. Some things, try as we might, are just unpredictable. For the vast middle ground between the two, there’s machine learning.
2%
Flag icon
The psychologist Don Norman coined the term conceptual model to refer to the rough knowledge of a technology we need to have in order to use it effectively. This book provides you with a conceptual model of machine learning.
3%
Flag icon
Each of the five tribes of machine learning has its own master algorithm, a general-purpose learner that you can in principle use to discover knowledge from data in any domain. The symbolists’ master algorithm is inverse deduction, the connectionists’ is backpropagation, the evolutionaries’ is genetic programming, the Bayesians’ is Bayesian inference, and the analogizers’ is the support vector machine. In practice, however, each of these algorithms is good for some things but not others. What we really want is a single algorithm combining the key features of all of them:
3%
Flag icon
If it exists, the Master Algorithm can derive all knowledge in the world—past, present, and future—from data. Inventing it would be one of the greatest advances in the history of science. It would speed up the progress of knowledge across the board, and change the world in ways that we can barely begin to imagine. The Master Algorithm is to machine learning what the Standard Model is to particle physics or the Central Dogma to molecular biology: a unified theory that makes sense of everything we know to date, and lays the foundation for decades or centuries of future progress. The Master ...more
4%
Flag icon
If you’re a machine-learning expert, you’re already familiar with much of what the book covers, but you’ll also find in it many fresh ideas, historical nuggets, and useful examples and analogies.
4%
Flag icon
every algorithm, no matter how complex, can be reduced to just these three operations: AND, OR, and NOT.
6%
Flag icon
all humans are mortal, but only 4 percent are Americans.
6%
Flag icon
Machine learning takes many different forms and goes by many different names: pattern recognition, statistical modeling, data mining, knowledge discovery, predictive analytics, data science, adaptive systems, self-organizing systems, and more.
6%
Flag icon
All of the important ideas in machine learning can be expressed math-free.
9%
Flag icon
Naïve Bayes, a learning algorithm that can be expressed as a single short equation.
10%
Flag icon
nearest-neighbor algorithm,
10%
Flag icon
decision tree learners
13%
Flag icon
engineering successes are not proof of scientific validity.
13%
Flag icon
even if they didn’t succeed, they learned many valuable lessons.
17%
Flag icon
Our search for the Master Algorithm is complicated, but also enlivened, by the rival schools of thought that exist within machine learning. The main ones are the symbolists, connectionists, evolutionaries, Bayesians, and analogizers.
17%
Flag icon
For symbolists, all intelligence can be reduced to manipulating symbols, in the same way that a mathematician solves equations by replacing expressions by other expressions. Symbolists understand that you can’t learn from scratch: you need some initial knowledge to go with the data.
17%
Flag icon
Their master algorithm is inverse deduction, which figures out what knowledge is missing in order to make a deduction go through, and then makes it as general as possible. For connectionists, learning is what the brain does, and so what we need to do is reverse engineer it. The brain learns by adjusting the strengths of connections between neurons, and the crucial problem is figuring out which connections are to blame for which errors and changing them accordingly.
18%
Flag icon
Our aim is to touch each part without jumping to conclusions; and once we’ve touched all of them, we will try to picture the whole elephant.
18%
Flag icon
Insight and persistence are what counts.
18%
Flag icon
Are you a rationalist or an empiricist? Rationalists believe that the senses deceive and that logical reasoning is the only sure path to knowledge. Empiricists believe that all reasoning is fallible and that knowledge must come from observation and experimentation.
18%
Flag icon
In computer science, theorists and knowledge engineers are rationalists; hackers and machine learners are empiricists.
18%
Flag icon
The rationalist likes to plan everything in advance before making the first move. The empiricist prefers to try things and see how they turn out.
18%
Flag icon
Descartes, Spinoza, and Leibniz were the leading rationalists;
18%
Flag icon
Locke, Berkeley, and Hume were their empiricist
18%
Flag icon
How can we ever be justified in generalizing from what we’ve seen to what we haven’t? Every learning algorithm is, in a sense, an attempt to answer this question.
19%
Flag icon
Is there any way to learn something from the past that we can be confident will apply in the future? And if there isn’t, isn’t machine learning a hopeless enterprise? For that matter, isn’t all of science, even all of human knowledge, on rather shaky ground?
20%
Flag icon
And therefore, on average over all possible worlds, pairing each world with its antiworld, your learner is equivalent to flipping coins.
20%
Flag icon
We don’t care about all possible worlds, only the one we live in. If we know something about the world and incorporate it into our learner, it now has an advantage over random guessing. To this Hume would reply that that knowledge must itself have come from induction and is therefore fallible. That’s true, even if the knowledge was encoded into our brains by evolution, but it’s a risk we’ll have to take. We can also ask whether there’s a nugget of knowledge so incontestable, so fundamental, that we can build all induction on top of it.
20%
Flag icon
In the meantime, the practical consequence of the “no free lunch” theorem is that there’s no such thing as learning without knowledge. Data alone is not enough. Starting from scratch will only get you to scratch. Machine learning is a kind of knowledge pump: we can use it to extract a lot of knowledge from data, but first we have to prime the pump. Machine learning is what mathematicians call an ill-posed problem: it doesn’t have a unique solution. Here’s a simple ill-posed problem: Which two numbers add up to 1,000? Assuming the numbers are positive, there are five hundred possible answers: 1 ...more
Tim Moore
500 possible answers assumes that the numbers are positive integers
20%
Flag icon
Newton’s principle is the first unwritten rule of machine learning. We induce the most widely applicable rules we can and reduce their scope only when the data forces us to.
20%
Flag icon
Newton’s principle is only the first step, however. We still need to figure out what is true of everything we’ve seen—how to extract the regularities from the raw data.
22%
Flag icon
Our beliefs are based on our experience, which gives us a very incomplete picture of the world, and it’s easy to jump to false conclusions.
22%
Flag icon
Overfitting happens when you have too many hypotheses and not enough data to tell them apart.
23%
Flag icon
Harvard’s Leslie Valiant received the Turing Award, the Nobel Prize of computer science, for inventing this type of analysis, which he describes in his book entitled, appropriately enough, Probably Approximately Correct.
26%
Flag icon
Decision trees are used in many different fields. In machine learning, they grew out of work in psychology. Earl Hunt and colleagues used them in the 1960s to model how humans acquire new concepts, and one of Hunt’s graduate students, J. Ross Quinlan, later tried using them for chess. His original goal was to predict the outcome of king-rook versus king-knight endgames from the board positions. From those humble beginnings, decision trees have grown to be, according to surveys, the most widely used machine-learning algorithm. It’s not hard to see why: they’re easy to understand, fast to learn, ...more
26%
Flag icon
The symbolists’ core belief is that all intelligence can be reduced to manipulating symbols. A mathematician solves equations by moving symbols around and replacing symbols by other symbols according to predefined rules.
26%
Flag icon
The psychologist David Marr argued that every information processing system should be studied at three distinct levels: the fundamental properties of the problem it’s solving; the algorithms and representations used to solve it; and how they are physically implemented.
27%
Flag icon
Despite the popularity of decision trees, inverse deduction is the better starting point for the Master Algorithm. It has the crucial property that incorporating knowledge into it is easy—and we know Hume’s problem makes that essential. Also, sets of rules are an exponentially more compact way to represent most concepts than decision trees. Converting a decision tree to a set of rules is easy: each path from the root to a leaf becomes a rule, and there’s no blowup. On the other hand, in the worst case converting a set of rules into a decision tree requires converting each rule into a ...more
27%
Flag icon
Inverse deduction is easily confused by noise:
27%
Flag icon
Most seriously, real concepts can seldom be concisely defined by a set of rules.
27%
Flag icon
They require weighing and accumulating weak evidence until a clear picture emerges. Diagnosing an illness involves giving more weight to some symptoms than others, and being OK with incomplete evidence.
27%
Flag icon
Donald Hebb, a Canadian psychologist, stated it this way in his 1949 book The Organization of Behavior:
28%
Flag icon
Perceptrons were invented in the late 1950s by Frank Rosenblatt, a Cornell psychologist.
28%
Flag icon
In a perceptron, a positive weight represents an excitatory connection, and a negative weight an inhibitory one. The perceptron outputs 1 if the weighted sum of its inputs is above threshold, and 0 if it’s below.
32%
Flag icon
We could do away with the problem of local optima by taking out the S curves and just letting each neuron output the weighted sum of its inputs. That would make the error surface very smooth, leaving only one minimum—the global one. The problem, though, is that a linear function of linear functions is still just a linear function, so a network of linear neurons is no better than a single neuron. A linear brain, no matter how large, is dumber than a roundworm. S curves are a nice halfway house between the dumbness of linear functions and the hardness of step functions.
32%
Flag icon
Backprop was invented in 1986 by David Rumelhart, a psychologist at the University of California, San Diego, with the help of Geoff Hinton and Ronald Williams. Among other things, they showed that backprop can learn XOR,
« Prev 1