More on this book
Community
Kindle Notes & Highlights
Read between
December 5 - December 17, 2018
Bottom line: learning is a race between the amount of data you have and the number of hypotheses you consider.
you don’t believe anything until you’ve verified it on data that the learner didn’t see.
it’s not enough for a new theory to explain past evidence because it’s easy to concoct a theory that does that; the theory must also make new predictions, and you only accept it after they’ve been experimentally verified.
These days we have larger data sets, but the quality of data collection isn’t necessarily better, so caveat emptor.
our best hope of creating a universal learner lies in synthesizing ideas from different paradigms.
Of course, it’s not enough to be able to tell when you’re overfitting; we need to avoid it in the first place.
better method is to realize that some false hypotheses will inevitably get through, but keep their number under control by rejecting enough low-significance ones, and then test the surviving hypotheses on further data.
Simple theories are preferable because they incur a lower cognitive cost (for us) and a lower computational cost (for our algorithms), not because we necessarily expect them to be more accurate.
If it keeps making the same mistakes, the problem is bias, and you need a more flexible learner (or just a different one). If there’s no pattern to the mistakes, the problem is variance, and you want to either try a less flexible learner or get more data.
Decision trees can be viewed as an answer to the question of what to do if rules of more than one concept match an instance. How do we then decide which concept the instance belongs to?
The psychologist David Marr argued that every information processing system should be studied at three distinct levels: the fundamental properties of the problem it’s solving; the algorithms and representations used to solve it; and how they are physically implemented.
If you ask a symbolist system where the concept “New York” is represented, it can point to the precise location in memory where it’s stored. In a connectionist system, the answer is “it’s stored a little bit everywhere.”
Every motion of your muscles follows an S curve: slow, then fast, then slow again.
Your eyes move in S curves, fixating on one thing and then another, along with your consciousness. Mood swings are phase transitions. So are birth, adolescence, falling in love, getting married, getting pregnant, getting a job, losing it, moving to a new town, getting promoted, retiring, and dying.
As the network sees more and more images of your grandmother and other people, the weights gradually converge to values that let it discriminate between the two.
Look under the hood, and… surprise: it’s the trusty old backprop engine, still humming. What changed? Nothing much, say the critics: just faster computers and bigger data. To which Hinton and others reply: exactly, we were right all along!
The Google Brain network of New York Times fame is a nine-layer sandwich of autoencoders and other ingredients that learns to recognize cats from YouTube videos.
Andrew Ng, one of the project’s principals, is also one of the leading proponents of the idea that human intelligence boils down to a single algorithm, and all we need to do is figure it out.
most important problems in machine learning—and life—is the exploration-exploitation dilemma. If you’ve found something that works, should you just keep doing it? Or is it better to try new things, knowing it could be a waste of time but also might lead to a better solution?
A midlife crisis is the yearning to explore after many years spent exploiting.
That’s the exploration-exploitation dilemma. Each time you play, you have to choose between repeating the best move you’ve found so far, which gives you the best payoff, or trying other moves, which gather information that may lead to even better payoffs.
and equally surely, a key driver of increasing realism in humanlike robots will be the sexbot industry. Sex just seems to be the end, rather than the means, of technological evolution.
In the conventional view, nature does its part first—evolving a brain—and then nurture takes it from there, filling the brain with information.
Baldwinian evolution, behaviors that are first learned later become genetically hardwired.
The conditional probability of fever given flu is therefore eleven out of fourteen, or 11/14. Conditioning reduces the size of the universe that we’re considering, in this case from all patients to only patients with the flu. In the universe of all patients, the probability of fever is 20/100; in the universe of flu-stricken patients, it’s 11/14.
probability is not a frequency but a subjective degree of belief.
A very simple and popular assumption is that all the effects are independent given the cause.
As the statistician George Box famously put it: “All models are wrong, but some are useful.”
Clearly, we need both logic and probability.
The easiest way out is to look in the files for the patient whose symptoms most closely resemble your current one’s and make the same diagnosis.
The Joint Chiefs of Staff urged an attack on Cuba, but Kennedy, having just read The Guns of August, a best-selling account of the outbreak of World War I, was keenly aware of how easily that could escalate into all-out war. So he opted for a naval blockade instead, perhaps saving the world from nuclear war.
Aristotle expressed it in his law of similarity: if two things are similar, the thought of one will tend to trigger the thought of the other.
Nearest-neighbor is the simplest and fastest learning algorithm ever invented. In fact, you could even say it’s the fastest algorithm of any kind that could ever be invented. It consists of doing exactly nothing, and therefore takes zero time to run.
Can’t beat that. If you want to learn to recognize faces and have a vast database of images labeled face/not face, just let it sit there. Don’t worry, be happy. Without knowing it, those images already implicitly form a model of what a face is.
Ken’s predicted rating is then the weighted average of his neighbors’, with each neighbor’s weight being his coefficient of correlation with Ken.
Nearest-neighbor was the first algorithm in history that could take advantage of unlimited amounts of data to learn arbitrarily complex concepts.
Cope is right, creativity—the ultimate unfathomable—boils down to analogy and recombination. Judge for yourself by googling “david cope mp3.”
The truth is more subtle, of course, and we often need to refine analogies after we make them. But being able to learn from a single example like this is surely a key attribute of a universal learner.
Everything we perceive is a cluster, from friends’ faces to speech sounds. Without them, we’d be lost: children can’t learn a language before they learn to identify the characteristic sounds it’s made of, which they do in their first year of life, and all the words they then learn mean nothing without the clusters of real things they refer to.
cluster A with a probability of 0.7 and cluster B with a probability of 0.3, and we can’t just decide that it belongs to cluster A without losing information. EM takes this into account by fractionally assigning the entity to the two clusters and updating their descriptions accordingly.
Little by little, all those clicks should add up to a picture of your taste, in the same way that all those pixels add up to a picture of your face. The question is how to do the adding.
Psychologists have found that personality boils down to five dimensions—extroversion, agreeableness, conscientiousness, neuroticism, and openness to experience—which they can infer from your tweets and blog posts.
Take the video stream from Robby’s eyes, treat each frame as a point in the space of images, and reduce that set of images to a single dimension. What will you discover? Time.
Time, in other words, is the principal component of memory.
the law of effect: actions that lead to pleasure are more likely to be repeated in the future; actions that lead to pain, less so.
They found that humans solve problems by decomposing them into subproblems, subsubproblems, and so on and systematically reducing the differences between the initial state (the first formula, say) and the goal state (the second formula).
Similarly, if you want to predict whether a teenager is at risk of starting to smoke, by far the best thing you can do is check whether her close friends smoke.
Mencken’s quip that a man is wealthy if he makes more than his wife’s sister’s husband involves four people.
We found, among other things, that marketing a product to the single most influential member—trusted by many followers who were in turn trusted by many others, and so on—was as good as marketing to a third of all the members in isolation.
If we can measure how strongly people influence each other, we can estimate how long it will be before a swing occurs, even if it’s the first one—another way in which black swans are not necessarily unpredictable.