More on this book
Community
Kindle Notes & Highlights
Read between
December 30, 2015 - January 22, 2016
Whoever has the best algorithms and the most data wins. A new type of network effect takes hold: whoever has the most customers accumulates the most data, learns the best models, wins the most new customers, and so on in a virtuous circle (or a vicious one, if you’re the competition).
the SKICAT (sky image cataloging and analysis tool) project used a learning algorithm. Starting from plates where objects were labeled with the correct categories, it figured out what characterizes each one and applied the result to all the unlabeled plates. Even better, it could classify objects that were too faint for humans to label, and these comprise the majority of the survey.
And if the ideas that really put a glimmer in researchers’ eyes bear fruit, machine learning will bring about not just a new era of civilization, but a new stage in the evolution of life on Earth.
so often happens in computer science, we’re willing to sacrifice efficiency for generality.
All of this is evidence that the brain uses the same learning algorithm throughout, with the areas dedicated to the different senses distinguished only by the different inputs they are connected to (e.g., eyes, ears, nose).
The number of connections in your brain is over a million times the number of letters in your genome, so it’s not physically possible for the genome to specify in detail how the brain is wired.
But if there’s something we know but the brain can’t learn, it must have been learned by evolution.
Even more remarkable, this mechanism is of a type very familiar to computer scientists: iterative search, where we solve a problem by trying many candidate solutions, selecting and modifying the best ones, and repeating these steps as many times as necessary. Evolution is an algorithm. Paraphrasing Charles Babbage, the Victorian-era computer pioneer, God created not species but the algorithm for creating species.
On the other hand, it’s been running for over three billion years on the most powerful computer on Earth: Earth itself.
When it sees a new piece of data, the hypotheses that are compatible with it become more likely, and the hypotheses that aren’t become less likely (or even impossible).
Modern learning algorithms can learn rich internal representations, not just pairwise associations between stimuli.
Learners like it are now used in just about every speech recognizer, including Siri. Fred Jelinek, head of the speech group at IBM, famously quipped that “every time I fire a linguist, the recognizer’s performance goes up.”
machine learning speaks probability, and knowledge engineering speaks logic.
The Master Algorithm is to algorithms what the hand is to pens, swords, screwdrivers, and forks.
Every transaction works on two levels: what it accomplishes for you and what it teaches the system you just interacted with.
The first line of defense is to make sure the good guys get it first—or, if it’s not clear who the good guys are, to make sure it’s open-sourced.
Control of data and ownership of the models learned from it is what many of the twenty-first century’s battles will be about—between governments, corporations, unions, and individuals.
because the laws of physics are chaotic, uncertainty compounds over time, and pretty soon they determine very little indeed. To accurately describe the world, we need a fresh batch of data at regular intervals. In effect, the laws of physics only tell us what happens locally. This drastically reduces their power.
The chances that the new case you need to make a decision on is already in the database are so vanishingly small that, without generalization, you won’t even get off the ground.
We induce the most widely applicable rules we can and reduce their scope only when the data forces us to. At first sight this may seem ridiculously overconfident, but it’s been working for science for over three hundred years. It’s certainly possible to imagine a universe so varied and capricious that Newton’s principle would systematically fail, but that’s not our universe.
data mining means “torturing the data until it confesses.”
The Clairvoyant Fund just beat the market ten years in a row. Wow, the manager must be a genius. Or not? If you have a thousand funds to choose from, the odds are better than even that one will beat the market ten years in a row, even if they’re all secretly run by dart-throwing monkeys.
The key is to realize that induction is just the inverse of deduction, in the same way that subtraction is the inverse of addition, or integration the inverse of differentiation.
Classifiers are the most widespread form of machine learning.
So to learn a good decision tree, we pick at each node the attribute that on average yields the lowest class entropy across all its branches, weighted by how many examples go into each branch.
No one has ever succeeded in learning a set of rules that will recognize a cat by looking at the pixels in an image, and probably no one ever will.
In contrast, connectionist representations are distributed: each concept is represented by many neurons, and each neuron participates in representing many different concepts. Neurons that excite one another form what Hebb called a cell assembly. Concepts and memories are represented in the brain by cell assemblies. Each of these can include neurons from different brain regions and overlap with other assemblies. The cell assembly for “leg” includes the one for “foot,” which includes assemblies for the image of a foot and the sound of the word foot.
The brain is a forest of billions of these trees, but there’s something unusual about them. Each tree’s branches make connections—synapses—to the roots of thousands of others, forming a massive tangle like nothing you’ve ever seen.
Placed end to end, the axons in your brain would stretch from Earth to the moon.
a friend recommends a movie and you go see it and like it, next time around you’ll probably follow her advice again. On the other hand, if she keeps gushing about movies you didn’t enjoy, you will start to ignore her opinions (and perhaps your friendship even wanes a bit).
It’s the errors that drive the learning.
Machine learning at the time was associated mainly with neural networks, and most researchers (not to mention funders) concluded that the only way to build an intelligent system was to explicitly program it. For the next fifteen years, knowledge engineering would hold center stage, and machine learning seemed to have been consigned to the ash heap of history.
Inspired by this, he defined a type of neural network that evolves over time in the same way that a spin glass does and postulated that the network’s minimum energy states are its memories.
In fact, the probability of finding the network in a particular state was given by the well-known Boltzmann distribution from thermodynamics, so they called their network a Boltzmann machine.
Rather than a logic gate, a neuron is more like a voltage-to-frequency converter.
When someone talks about exponential growth, ask yourself: How soon will it turn into an S curve?
exploration-exploitation dilemma. If you’ve found something that works, should you just keep doing it? Or is it better to try new things, knowing it could be a waste of time but also might lead to a better solution? Would you rather be a cowboy or a farmer? Start a company or run an existing one? Go steady or play the field? A midlife crisis is the yearning to explore after many years spent exploiting.
Each time you play, you have to choose between repeating the best move you’ve found so far, which gives you the best payoff, or trying other moves, which gather information that may lead to even better payoffs.
He sees genetic programming as an invention machine, a silicon Edison for the twenty-first century.
Sex just seems to be the end, rather than the means, of technological evolution.
evolution is excruciatingly slow. The entire life of an organism yields only one piece of information about its genome: its fitness, reflected in the organism’s number of offspring.
The molecular biology of living cells is such a mess that molecular biologists often quip that only people who don’t know any of it could believe in intelligent design.
Christianity as we know it was invented by Saint Paul, while Jesus saw himself as the pinnacle of the Jewish faith.
Bayes’ theorem is useful because what we usually know is the probability of the effects given the causes, but what we want to know is the probability of the causes given the effects.
All models are wrong, but some are useful
A learner that uses Bayes’ theorem and assumes the effects are independent given the cause is called a Naïve Bayes classifier.
Bayesian networks give the lie to the common misconception that machine learning can’t predict very rare events, or “black swans,” as Nassim Taleb calls them.
Google uses a giant Bayesian network of this type in its AdSense system for automatically choosing ads to place on web pages. The network relates a million content variables to each other and to twelve million words and phrases via over three hundred million arrows, all learned from a hundred billion text snippets and search queries.
A cleverer solution is to just ask the first soldier behind you: “How many soldiers are behind you?” Each soldier asks the next the same question, until the last one says “None.” The next-to-last soldier can now say “One,” and so on all the way back to the first soldier, with each soldier adding one to the number of soldiers behind him. Now you know how many soldiers are still with you, and you didn’t even have to stop.
“How many Bayesians does it take to change a lightbulb? They’re not sure. Come to think of it, they’re not sure the lightbulb is burned out.”