Ian Pitchford’s Kindle Notes & Highlights for The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World

Connectionists reverse engineer the brain and are inspired by neuroscience and physics. Evolutionaries simulate evolution on the computer and draw on genetics and evolutionary biology. Bayesians believe learning is a form of probabilistic inference and have their roots in statistics. Analogizers learn by extrapolating from similarity judgments and are influenced by psychology and mathematical optimization. Driven by the goal of building learning machines, we’ll tour a good chunk of the intellectual history of the last hundred years and see it in a new light.

3%

Each of the five tribes of machine learning has its own master algorithm, a general-purpose learner that you can in principle use to discover knowledge from data in any domain.

3%

The symbolists’ master algorithm is inverse deduction, the connectionists’ is backpropagation, the evolutionaries’ is genetic programming, the Bayesians’ is Bayesian inference, an...

This highlight has been truncated due to consecutive passage length restrictions.

3%

The second goal of this book is thus to enable you to invent the Master Algorithm. You’d think this would require heavy-duty mathematics and severe theoretical work. On the contrary, what it requires is stepping back from the mathematical arcana to see the overarching pattern of learning phenomena; and for this the layman, approaching the forest from a distance, is in some ways better placed than the specialist, already deeply immersed in the study of particular trees.

4%

My doctoral dissertation unified symbolic and analogical learning. I’ve spent much of the last ten years unifying symbolism and Bayesianism, and more recently those two with connectionism. It’s time to go the next step and attempt a synthesis of all five paradigms.

4%

If you’re curious what all the hubbub surrounding big data and machine learning is about and suspect that there’s something deeper going on than what you see in the papers, you’re right! This book is your guide to the revolution. If your main interest is in the business uses of machine learning, this book can help you in at least six ways: to become a savvier consumer of analytics; to make the most of your data scientists; to avoid the pitfalls that kill so many data-mining projects; to discover what you can automate without the expense of hand-coded software; to reduce the rigidity of your ...more

4%

Claude Shannon, better known as the father of information theory, was the first to realize that what transistors are doing, as they switch on and off in response to other transistors, is reasoning. (That was his master’s thesis at MIT—the most important master’s thesis of all time.)

4%

Believe it or not, every algorithm, no matter how complex, can be reduced to just these three operations: AND, OR, and NOT.

5%

Algorithms are an exacting standard. It’s often said that you don’t really understand something until you can express it as an algorithm. (As Richard Feynman said, “What I cannot create, I do not understand.”)

6%

The second thing is that machine learning is a sword with which to slay the complexity monster. Given enough data, a learning program that’s only a few hundred lines long can easily generate a program with millions of lines, and it can do this again and again for different problems. The reduction in complexity for the programmer is phenomenal. Of course, like the Hydra, the complexity monster sprouts new heads as soon as we cut off the old ones, but they start off smaller and take a while to grow, so we still get a big leg up.

6%

Machine learning is sometimes confused with artificial intelligence (or AI for short). Technically, machine learning is a subfield of AI, but it’s grown so large and successful that it now eclipses its proud parent. The goal of AI is to teach computers to do what humans currently do better, and learning is arguably the most important of those things: without it, no computer can keep up with a human for long; with it, the rest follows.

7%

Many computer scientists, particularly those of an older generation, don’t understand machine learning as well as they’d like to. This is because computer science has traditionally been all about thinking deterministically, but machine learning requires thinking statistically.

7%

This difference in thinking is a large part of why Microsoft has had a lot more trouble catching up with Google than it did with Netscape.

7%

According to tech guru Tim O’Reilly, “data scientist” is the hottest job title in Silicon Valley. The McKinsey Global Institute estimates that by 2018 the United States alone will need 140,000 to 190,000 more machine-learning experts than will be available, and 1.5 million more data-savvy managers.

7%

All of the important ideas in machine learning can be expressed math-free.

7%

With Google’s annual revenue of $50 billion, every 1 percent improvement in click prediction potentially means another half billion dollars in the bank, every year, for the company. No wonder Google is a big fan of machine learning, and Yahoo and others are trying hard to catch up.

8%

Switching from Google to Bing may be easier than switching from Windows to Mac, but in practice you don’t because Google, with its head start and larger market share, knows better what you want, even if Bing’s technology is just as good. And pity a new entrant into the search business, starting with zero data against engines with over a decade of learning behind them.

8%

Better still, keep the learners running and continuously adjust all aspects of the website.

8%

Businesses look at data as a strategic asset: What data do I have that my competitors don’t? How can I take advantage of it? What data do my competitors have that I don’t?

8%

Machine learning is the scientific method on steroids.

8%

The good news today is that sciences that were once data-poor are now data-rich. Instead of paying fifty bleary-eyed undergraduates to perform some task in the lab, psychologists can get as many subjects as they want by posting the task on Amazon’s Mechanical Turk. (It makes for a more diverse sample too.)

8%

It’s getting hard to remember, but little more than a decade ago sociologists studying social networks lamented that they couldn’t get their hands on a network with more than a few hundred members. Now there’s Facebook, with over a billion. A good chunk of those members post almost blow-by-blow accounts of their lives too; it’s like having a live feed of social life on planet Earth. In neuroscience, connectomics and functional magnetic resonance imaging have opened an extraordinarily detailed window into the brain. In molecular biology, databases of genes and proteins grow exponentially. Even ...more

9%

If computers hadn’t been invented, science would have ground to a halt in the second half of the twentieth century. This might not have been immediately apparent to the scientists because they would have been focused on whatever limited progress they could still make, but the ceiling for that progress would have been much, much lower. Similarly, without machine learning, many sciences would face diminishing returns in the decades to come.

9%

To see the future of science, take a peek inside a lab at the Manchester Institute of Biotechnology, where a robot by the name of Adam is hard at work figuring out which genes encode which enzymes in yeast. Adam has a model of yeast metabolism and general knowledge of genes and proteins. It makes hypotheses, designs experiments to test them, physically carries them out, analyzes the results, and comes up with new hypotheses until it’s satisfied. Today, human scientists still independently check Adam’s conclusions before they believe them, but tomorrow they’ll leave it to robot scientists to ...more

9%

They consolidated all voter information into a single database; combined it with what they could get from social networking, marketing, and other sources; and set about predicting four things for each individual voter: how likely he or she was to support Obama, show up at the polls, respond to the campaign’s reminders to do so, and change his or her mind about the election based on a conversation about a specific issue.

10%

Factoring in the costs and benefits of different actions, as game theory does, can also help strike the right balance between privacy and security.

10%

Machine learning is like having a radar that sees into the future. Don’t just react to your adversary’s moves; predict them and preempt them.

11%

Web 2.0 brought a swath of new applications, from mining social networks to figuring out what bloggers are saying about your products. In parallel, scientists of all stripes were increasingly turning to large-scale modeling, with molecular biologists and astronomers leading the charge. The housing bust barely registered; its main effect was a welcome transfer of talent from Wall Street to Silicon Valley. In 2011, the “big data” meme hit, putting machine learning squarely in the center of the global economy’s future. Today, there seems to be hardly an area of human endeavor untouched by machine ...more

11%

In fact, just a few algorithms are responsible for the great majority of machine-learning applications, and we’ll take a look at them in the next few chapters.

11%

If so few learners can do so much, the logical question is: Could one learner do everything? In other words, could a single algorithm learn all that can be learned from data? This is a very tall order, since it would ultimately include everything in an adult’s brain, everything evolution has created, and the sum total of all scientific knowledge. But in fact all the major learners—including nearest-neighbor, decision trees, and Bayesian networks, a generalization of Naïve Bayes—are universal in the following sense: if you give the learner enough of the appropriate data, it can approximate any ...more

12%

The question then becomes: How weak can the assumptions be and still allow all relevant knowledge to be derived from finite data? Notice the word relevant: we’re only interested in knowledge about our world, not about worlds that don’t exist.

12%

The cortex is organized into columns with six distinct layers, feedback loops running to another brain structure called the thalamus, and a recurring pattern of short-range inhibitory connections and longer-range excitatory ones.

12%

The cerebellum, the evolutionarily older part of the brain responsible for low-level motor control, has a clearly different and very regular architecture, built out of much smaller neurons, so it would seem that at least motor learning uses a different algorithm. If someone’s cerebellum is injured, however, the cortex takes over its function. Thus it seems that evolution kept the cerebellum around not because it does something the cortex can’t, but just because it’s more efficient.

12%

The most important argument for the brain being the Master Algorithm, however, is that it’s responsible for everything we can perceive and imagine. If something exists but the brain can’t learn it, we don’t know it exists. We may just not see it or think it’s random. Either way, if we implement the brain in a computer, that algorithm can learn everything we can. Thus one route—arguably the most popular one—to inventing the Master Algorithm is to reverse engineer the brain. Jeff Hawkins took a stab at this in his book On Intelligence. Ray Kurzweil pins his hopes for the Singularity—the rise of ...more

13%

Not all neuroscientists believe in the unity of the cortex; we need to learn more before we can be sure. The question of just what the brain can and can’t learn is also hotly debated. But if there’s something we know but the brain can’t learn, it must have been learned by evolution.

13%

Evolution is the ultimate example of how much a simple learning algorithm can achieve given enough data. Its input is the experience and fate of all living creatures that ever existed. (Now that’s big data.) On the other hand, it’s been running for over three billion years on the most powerful computer on Earth: Earth itself. A computer version of it had better be faster and less data intensive than the original. Which one is the better model for the Master Algorithm: evolution or the brain? This is machine learning’s version of the nature versus nurture debate. And, just as nature and nurture ...more

13%

How much of the character of physical law percolates up to higher domains like biology and sociology remains to be seen, but the study of chaos provides many tantalizing examples of very different systems with similar behavior, and the theory of universality explains them.

13%

In physics, the same equations applied to different quantities often describe phenomena in completely different fields, like quantum mechanics, electromagnetism, and fluid dynamics. The wave equation, the diffusion equation, Poisson’s equation: once we discover it in one field, we can more readily discover it in others; and once we’ve learned how to solve it in one field, we know how to solve it in all.

13%

Moreover, all these equations are quite simple and involve the same few derivatives of quantities with respect to space and time. Quite conceivably, they are all instances of a master equation, and all the Master Algorithm needs to do is figure out how to instantiate it for different data sets.

13%

In optimization, simple functions often give rise to surprisingly complex solutions.

13%

Optimization plays a prominent role in almost every field of science, technology, and business, including machine learning. Each field optimizes within the constraints defined by optimizations in other fields. We try to maximize our happiness within economic constraints, which are firms’ best solutions within the constraints of the available technology—which in turn consists of the best solutions we could find within the constraints of biology and physics. Biology, in turn, is the result of optimization by evolution within the constraints of physics and chemistry, and the laws of physics ...more

14%

Machine learning is what you get when the unreasonable effectiveness of mathematics meets the unreasonable effectiveness of data. Biology and sociology will never be as simple as physics, but the method by which we discover their truths can be.

14%

According to one school of statisticians, a single simple formula underlies all learning. Bayes’ theorem, as the formula is known, tells you how to update your beliefs whenever you see new evidence. A Bayesian learner starts with a set of hypotheses about the world.

14%

Bayes’ theorem is a machine that turns data into knowledge. According to Bayesian statisticians, it’s the only correct way to turn data into knowledge. If they’re right, either Bayes’ theorem is the Master Algorithm or it’s the engine that drives it.

14%

With big data and big computing to go with it, however, Bayes can find its way in vast hypothesis spaces and has spread to every conceivable field of knowledge. If there’s a limit to what Bayes can learn, we haven’t found it yet.

14%

When I was a senior in college, I wasted a summer playing Tetris, a highly addictive video game where variously shaped pieces fall from above and which you try to pack as closely together as you can; the game is over when the pile of pieces reaches the top of the screen. Little did I know that this was my introduction to NP-completeness, the most important problem in theoretical computer science. Turns out that, far from an idle pursuit, mastering Tetris—really mastering it—is one of the most useful things you could ever do. If you can solve Tetris, you can solve thousands of the hardest and ...more

14%

NP-completeness aside, the sheer existence of computers is itself a powerful sign that there is a Master Algorithm.

15%

Then in 1936 Alan Turing imagined a curious contraption with a tape and a head that read and wrote symbols on it, now known as a Turing machine. Every conceivable problem that can be solved by logical deduction can be solved by a Turing machine. Furthermore, a so-called universal Turing machine can simulate any other by reading its specification from the tape—in other words, it can be programmed to do anything.

15%

The Master Algorithm is for induction, the process of learning, what the Turing machine is for deduction. It can learn to simulate any other algorithm by reading examples of its input-output behavior.

15%

Despite machine learning’s successes, the knowledge engineers remain unconvinced. They believe that its limitations will soon become apparent, and the pendulum will swing back. Marvin Minsky, an MIT professor and AI pioneer, is a prominent member of this camp. Minsky is not just skeptical of machine learning as an alternative to knowledge engineering, he’s skeptical of any unifying ideas in AI. Minsky’s theory of intelligence, as expressed in his book The Society of Mind, could be unkindly characterized as “the mind is just one damn thing after another.” The Society of Mind is a laundry list ...more

See a Problem?

Preview — The Master Algorithm by Pedro Domingos