More on this book
Community
Kindle Notes & Highlights
Another prominent machine-learning skeptic is the linguist Noam Chomsky. Chomsky believes that language must be innate, because the examples of grammatical sentences children hear are not enough to learn a grammar. This only puts the burden of learning language on evolution, however; it does not argue against the Master Algorithm but only against it being something like the brain. Moreover, if a universal grammar exists (as Chomsky believes), elucidating it is a step toward elucidating the Master Algorithm. The only way this is not the case is if language has nothing in common with other
...more
More generally, Chomsky is critical of all statistical learning. He has a list of things statistical learners can’t do, but the list is fifty years out of date. Chomsky seems to equate machine learning with behaviorism, where animal behavior is reduced to associating responses with rewards. But machine learning is not behaviorism. Modern learning algorithms can learn rich internal representations, not just pairwise associations between stimuli.
To this Chomsky might reply that engineering successes are not proof of scientific validity. On the other hand, if your buildings collapse and your engines don’t run, perhaps something is wrong with your theory of physics. Chomsky thinks linguists should focus on “ideal” speaker-listeners, as defined by him, and this gives him license to ignore things like the need for statistics in language learning. Perhaps it’s not surprising, then, that few experimentalists take his theories seriously any more.
Critics like Minsky, Chomsky, and Fodor once had the upper hand, but thankfully their influence has waned. Nevertheless, we should keep their criticisms in mind as we set out on the road to the Master Algorithm for two reasons. The first is that knowledge engineers faced many of the same problems machine learners do, and even if they didn’t succeed, they learned many valuable lessons. The second is that learning and knowledge are intertwined in surprisingly subtle ways, as we’ll soon find out. Unfortunately, the two camps often talk past each other. They speak different languages: machine
...more
“No matter how smart your algorithm, there are some things it just can’t learn.” Outside of AI and cognitive science, the most common objections to machine learning are variants of this claim. Nassim Taleb hammered on it forcefully in his book The Black Swan. Some events are simply not predictable. If you’ve only ever seen white swans, you think the probability of ever seeing a black one is zero. The financial meltdown of 2008 was a “black swan.”
Learning algorithms are quite capable of accurately predicting rare, never-before-seen events; you could even say that that’s what machine learning is all about. What’s the probability of a black swan if you’ve never seen one? How about it’s the fraction of known species that belatedly turned out to have black specimens? This is only a crude example; we’ll see many deeper ones in this book.
A related, frequently heard objection is “Data can’t replace human intuition.” In fact, it’s the other way around: human intuition can’t replace data. Intuition is what you use when you don’t know the facts, and since you often don’t, intuition is precious. But when the evidence is before you, why would you deny it?
OK, some say, machine learning can find statistical regularities in data, but it will never discover anything deep, like Newton’s laws. It arguably hasn’t yet, but I bet it will. Stories of falling apples notwithstanding, deep scientific truths are not low-hanging fruit. Science goes through three phases, which we can call the Brahe, Kepler, and Newton phases. In the Brahe phase, we gather lots of data, like Tycho Brahe patiently recording the positions of the planets night after night, year after year. In the Kepler phase, we fit empirical laws to the data, like Kepler did to the planets’
...more
In a famous passage of his book The Sciences of the Artificial, AI pioneer and Nobel laureate Herbert Simon asked us to consider an ant laboriously making its way home across a beach. The ant’s path is complex, not because the ant itself is complex but because the environment is full of dunelets to climb and pebbles to get around. If we tried to model the ant by programming in every possible path, we’d be doomed. Similarly, in machine learning the complexity is in the data; all the Master Algorithm has to do is assimilate it, so we shouldn’t be surprised if it turns out to be simple.
The computer is your tool, not your adversary. Armed with machine learning, a manager becomes a supermanager, a scientist a superscientist, an engineer a superengineer. The future belongs to those who understand at a very deep level how to combine their unique expertise with what algorithms do best.
Knowledge is general, and has structure.
How can we ever be justified in generalizing from what we’ve seen to what we haven’t? Every learning algorithm is, in a sense, an attempt to answer this question.
Discovering rules in this way was the brainchild of Ryszard Michalski, a Polish computer scientist. Michalski’s hometown of Kalusz was successively part of Poland, Russia, Germany, and Ukraine, which may have left him more attuned than most to disjunctive concepts. After immigrating to the United States in 1970, he went on to found the symbolist school of machine learning, along with Tom Mitchell and Jaime Carbonell. He had an imperious personality. If you gave a talk at a machine-learning conference, the odds were good that at the end he’d raise his hand to point out that you had just
...more
Learning is forgetting the details as much as it is remembering the important parts.
Overfitting is the central problem in machine learning.
Thus a good learner is forever walking the narrow path between blindness and hallucination.
Our beliefs are based on our experience, which gives us a very incomplete picture of the world, and it’s easy to jump to false conclusions.
It’s even been said that data mining means “torturing the data until it confesses.”
An E. coli bacterium can divide into two roughly every fifteen minutes; given enough nutrients it can grow into a mass of bacteria the size of Earth in about a day.
A clock that’s always an hour late has high bias but low variance.
The key is to realize that induction is just the inverse of deduction, in the same way that subtraction is the inverse of addition, or integration the inverse of differentiation. This idea was first proposed by William Stanley Jevons in the late 1800s. Steve Muggleton and Wray Buntine, an English Australian team, designed the first practical algorithm based on it in 1988.
The psychologist David Marr argued that every information processing system should be studied at three distinct levels: the fundamental properties of the problem it’s solving; the algorithms and representations used to solve it; and how they are physically implemented. For example, addition can be defined by a set of axioms irrespective of how it’s carried out; numbers can be expressed in different ways (e.g., Roman and Arabic) and added using different algorithms; and these can be implemented using an abacus, a pocket calculator, or even, very inefficiently, in your head. Learning is a prime
...more
Hebb’s rule, as it has come to be known, is the cornerstone of connectionism. Indeed, the field derives its name from the belief that knowledge is stored in the connections between neurons. Donald Hebb, a Canadian psychologist, stated it this way in his 1949 book The Organization of Behavior: “When an axon of cell A is near enough cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.”
And Prince Charming would be a Caltech physicist by the name of John Hopfield. In 1982, Hopfield noticed a striking analogy between the brain and spin glasses, an exotic material much beloved of statistical physicists. This set off a connectionist renaissance that culminated a few years later in the invention of the first algorithms capable of solving the credit-assignment problem, ushering in a new era where machine learning replaced knowledge engineering as the dominant paradigm in AI.
In 1985, David Ackley, Geoff Hinton, and Terry Sejnowski replaced the deterministic neurons in Hopfield networks with probabilistic ones. A neural network now had a probability distribution over its states, with higher-energy states being exponentially less likely than lower-energy ones. In fact, the probability of finding the network in a particular state was given by the well-known Boltzmann distribution from thermodynamics, so they called their network a Boltzmann machine.
Geoff Hinton went on to try many variations on Boltzmann machines over the following decades. Hinton, a psychologist turned computer scientist and great-great-grandson of George Boole, the inventor of the logical calculus used in all digital computers, is the world’s leading connectionist.
The European Union is investing a billion euros to build a soup-to-nuts model of it. America’s BRAIN initiative, with $100 million in funding in 2014 alone, has similar aims.
Its prophet was a ruddy-faced, perpetually grinning midwesterner by the name of John Holland. Darwin’s algorithm Like many other early machine-learning researchers, Holland started out working on neural networks, but his interests took a different turn when, while a graduate student at the University of Michigan, he read Ronald Fisher’s classic treatise The Genetical Theory of Natural Selection. In it, Fisher, who was also the founder of modern statistics, formulated the first mathematical theory of evolution. Brilliant as it was, Holland felt that Fisher’s theory left out the essence of
...more
Although it is less well known, many of the most important technologies in the world are the result of inventing a unifier, a single mechanism that does what previously required many.