The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World
Rate it:
23%
Flag icon
Bottom line: learning is a race between the amount of data you have and the number of hypotheses you consider.
23%
Flag icon
you don’t believe anything until you’ve verified it on data that the learner didn’t see.
23%
Flag icon
it’s not enough for a new theory to explain past evidence because it’s easy to concoct a theory that does that; the theory must also make new predictions, and you only accept it after they’ve been experimentally verified.
23%
Flag icon
These days we have larger data sets, but the quality of data collection isn’t necessarily better, so caveat emptor.
23%
Flag icon
our best hope of creating a universal learner lies in synthesizing ideas from different paradigms.
23%
Flag icon
Of course, it’s not enough to be able to tell when you’re overfitting; we need to avoid it in the first place.
23%
Flag icon
better method is to realize that some false hypotheses will inevitably get through, but keep their number under control by rejecting enough low-significance ones, and then test the surviving hypotheses on further data.
24%
Flag icon
Simple theories are preferable because they incur a lower cognitive cost (for us) and a lower computational cost (for our algorithms), not because we necessarily expect them to be more accurate.
24%
Flag icon
If it keeps making the same mistakes, the problem is bias, and you need a more flexible learner (or just a different one). If there’s no pattern to the mistakes, the problem is variance, and you want to either try a less flexible learner or get more data.
25%
Flag icon
Decision trees can be viewed as an answer to the question of what to do if rules of more than one concept match an instance. How do we then decide which concept the instance belongs to?
26%
Flag icon
The psychologist David Marr argued that every information processing system should be studied at three distinct levels: the fundamental properties of the problem it’s solving; the algorithms and representations used to solve it; and how they are physically implemented.
27%
Flag icon
If you ask a symbolist system where the concept “New York” is represented, it can point to the precise location in memory where it’s stored. In a connectionist system, the answer is “it’s stored a little bit everywhere.”
30%
Flag icon
Every motion of your muscles follows an S curve: slow, then fast, then slow again.
30%
Flag icon
Your eyes move in S curves, fixating on one thing and then another, along with your consciousness. Mood swings are phase transitions. So are birth, adolescence, falling in love, getting married, getting pregnant, getting a job, losing it, moving to a new town, getting promoted, retiring, and dying.
31%
Flag icon
As the network sees more and more images of your grandmother and other people, the weights gradually converge to values that let it discriminate between the two.
33%
Flag icon
Look under the hood, and… surprise: it’s the trusty old backprop engine, still humming. What changed? Nothing much, say the critics: just faster computers and bigger data. To which Hinton and others reply: exactly, we were right all along!
33%
Flag icon
The Google Brain network of New York Times fame is a nine-layer sandwich of autoencoders and other ingredients that learns to recognize cats from YouTube videos.
33%
Flag icon
Andrew Ng, one of the project’s principals, is also one of the leading proponents of the idea that human intelligence boils down to a single algorithm, and all we need to do is figure it out.
36%
Flag icon
most important problems in machine learning—and life—is the exploration-exploitation dilemma. If you’ve found something that works, should you just keep doing it? Or is it better to try new things, knowing it could be a waste of time but also might lead to a better solution?
36%
Flag icon
A midlife crisis is the yearning to explore after many years spent exploiting.
36%
Flag icon
That’s the exploration-exploitation dilemma. Each time you play, you have to choose between repeating the best move you’ve found so far, which gives you the best payoff, or trying other moves, which gather information that may lead to even better payoffs.
38%
Flag icon
and equally surely, a key driver of increasing realism in humanlike robots will be the sexbot industry. Sex just seems to be the end, rather than the means, of technological evolution.
38%
Flag icon
In the conventional view, nature does its part first—evolving a brain—and then nurture takes it from there, filling the brain with information.
38%
Flag icon
Baldwinian evolution, behaviors that are first learned later become genetically hardwired.
40%
Flag icon
The conditional probability of fever given flu is therefore eleven out of fourteen, or 11/14. Conditioning reduces the size of the universe that we’re considering, in this case from all patients to only patients with the flu. In the universe of all patients, the probability of fever is 20/100; in the universe of flu-stricken patients, it’s 11/14.
41%
Flag icon
probability is not a frequency but a subjective degree of belief.
41%
Flag icon
A very simple and popular assumption is that all the effects are independent given the cause.
41%
Flag icon
As the statistician George Box famously put it: “All models are wrong, but some are useful.”
48%
Flag icon
Clearly, we need both logic and probability.
48%
Flag icon
The easiest way out is to look in the files for the patient whose symptoms most closely resemble your current one’s and make the same diagnosis.
48%
Flag icon
The Joint Chiefs of Staff urged an attack on Cuba, but Kennedy, having just read The Guns of August, a best-selling account of the outbreak of World War I, was keenly aware of how easily that could escalate into all-out war. So he opted for a naval blockade instead, perhaps saving the world from nuclear war.
48%
Flag icon
Aristotle expressed it in his law of similarity: if two things are similar, the thought of one will tend to trigger the thought of the other.
49%
Flag icon
Nearest-neighbor is the simplest and fastest learning algorithm ever invented. In fact, you could even say it’s the fastest algorithm of any kind that could ever be invented. It consists of doing exactly nothing, and therefore takes zero time to run.
49%
Flag icon
Can’t beat that. If you want to learn to recognize faces and have a vast database of images labeled face/not face, just let it sit there. Don’t worry, be happy. Without knowing it, those images already implicitly form a model of what a face is.
50%
Flag icon
Ken’s predicted rating is then the weighted average of his neighbors’, with each neighbor’s weight being his coefficient of correlation with Ken.
50%
Flag icon
Nearest-neighbor was the first algorithm in history that could take advantage of unlimited amounts of data to learn arbitrarily complex concepts.
53%
Flag icon
Cope is right, creativity—the ultimate unfathomable—boils down to analogy and recombination. Judge for yourself by googling “david cope mp3.”
54%
Flag icon
The truth is more subtle, of course, and we often need to refine analogies after we make them. But being able to learn from a single example like this is surely a key attribute of a universal learner.
55%
Flag icon
Everything we perceive is a cluster, from friends’ faces to speech sounds. Without them, we’d be lost: children can’t learn a language before they learn to identify the characteristic sounds it’s made of, which they do in their first year of life, and all the words they then learn mean nothing without the clusters of real things they refer to.
56%
Flag icon
cluster A with a probability of 0.7 and cluster B with a probability of 0.3, and we can’t just decide that it belongs to cluster A without losing information. EM takes this into account by fractionally assigning the entity to the two clusters and updating their descriptions accordingly.
57%
Flag icon
Little by little, all those clicks should add up to a picture of your taste, in the same way that all those pixels add up to a picture of your face. The question is how to do the adding.
57%
Flag icon
Psychologists have found that personality boils down to five dimensions—extroversion, agreeableness, conscientiousness, neuroticism, and openness to experience—which they can infer from your tweets and blog posts.
58%
Flag icon
Take the video stream from Robby’s eyes, treat each frame as a point in the space of images, and reduce that set of images to a single dimension. What will you discover? Time.
58%
Flag icon
Time, in other words, is the principal component of memory.
58%
Flag icon
the law of effect: actions that lead to pleasure are more likely to be repeated in the future; actions that lead to pain, less so.
60%
Flag icon
They found that humans solve problems by decomposing them into subproblems, subsubproblems, and so on and systematically reducing the differences between the initial state (the first formula, say) and the goal state (the second formula).
61%
Flag icon
Similarly, if you want to predict whether a teenager is at risk of starting to smoke, by far the best thing you can do is check whether her close friends smoke.
61%
Flag icon
Mencken’s quip that a man is wealthy if he makes more than his wife’s sister’s husband involves four people.
62%
Flag icon
We found, among other things, that marketing a product to the single most influential member—trusted by many followers who were in turn trusted by many others, and so on—was as good as marketing to a third of all the members in isolation.
62%
Flag icon
If we can measure how strongly people influence each other, we can estimate how long it will be before a swing occurs, even if it’s the first one—another way in which black swans are not necessarily unpredictable.