Beau D Lyddon’s Kindle Notes & Highlights for The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World

Rate it:

More on this book

Community

Tim Moore

4 notes & 86 highlights

Gordon

1 note & 1 highlight

Oliver

1 note & 1 highlight

Jorge Martínez

10 notes & 72 highlights

Derek Jones

6 notes & 15 highlights

Joshua

12 notes & 33 highlights

NANCY GARCIA

28 notes & 100 highlights

Ibrahim Saud

1 note & 1 highlight

Matthew Yeager

1 note & 64 highlights

Villan

1 note & 2 highlights

Ruth

3 notes & 3 highlights

Nick

1 note & 99 highlights

Doug Lautzenheiser

Arun

Francisco Soto

Harald G.

Roberto Paredes

Ged

Ian Pitchford

Nicholas Patience

Edric Subur

Laimis

Gaurav Rana

Damon R Kost

Rajesh

Moh

Kindle Notes & Highlights

by Beau D Lyddon

See all Beau D Lyddon’s Notes & Highlights

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World

by Pedro Domingos

Symbolists view learning as the inverse of deduction and take ideas from philosophy, psychology, and logic. Connectionists reverse engineer the brain and are inspired by neuroscience and physics. Evolutionaries simulate evolution on the computer and draw on genetics and evolutionary biology. Bayesians believe learning is a form of probabilistic inference and have their roots in statistics. Analogizers learn by extrapolating from similarity judgments and are influenced by psychology and mathematical optimization.

Believe it or not, every algorithm, no matter how complex, can be reduced to just these three operations: AND, OR, and NOT.

Scientists make theories, and engineers make devices. Computer scientists make algorithms, which are both theories and devices.

Every algorithm has an input and an output: the data goes into the computer, the algorithm does what it will with it, and out comes the result. Machine learning turns this around: in goes the data and the desired result and out comes the algorithm that turns one into the other.

Machine-learning experts (aka machine learners) are an elite priesthood even among computer scientists. Many computer scientists, particularly those of an older generation, don’t understand machine learning as well as they’d like to. This is because computer science has traditionally been all about thinking deterministically, but machine learning requires thinking statistically. If a rule for, say, labeling e-mails as spam is 99 percent accurate, that does not mean it’s buggy; it may be the best you can do and good enough to be useful. This difference in thinking is a large part of why ...more

12%

The Master Algorithm is for induction, the process of learning, what the Turing machine is for deduction.

20%

If all conjunctions of two factors fail, you can try all conjunctions of any number of factors. Machine learners and psychologists call these “conjunctive concepts.”

20%

Dictionary definitions are conjunctive concepts: a chair has a seat and a back and some number of legs. Remove any of these and it’s no longer a chair.

22%

Overfitting happens when you have too many hypotheses and not enough data to tell them apart.

24%

Was it blindness or hallucination? In machine learning, the technical terms for these are bias and variance.

24%

clock that’s always an hour late has high bias but low variance. If instead the clock alternates erratically between fast and slow but on average tells the right time, it has high variance but low bias.

24%

If it keeps making the same mistakes, the problem is bias, and you need a more flexible learner (or just a different one). If there’s no pattern to the mistakes, the problem is variance, and you want to either try a less flexible learner or get more data.

39%

powerful than you’d guess from its everyday uses. At heart, Bayes’ theorem is just a simple rule for updating your degree of belief in a hypothesis when you receive new evidence: if the evidence is consistent with the hypothesis, the probability of the hypothesis goes up; if not, it goes down.

41%

A learner that uses Bayes’ theorem and assumes the effects are independent given the cause is called a Naïve Bayes classifier.

41%

But machine learning is the art of making false assumptions and getting away with it.

41%

As the statistician George Box famously put it: “All models are wrong, but some are useful.”

42%

The states form a Markov chain, as before, but we don’t get to see them; we have to infer them from the observations. This is called a hidden Markov model, or HMM for short.

43%

Pearl realized that it’s OK to have a complex network of dependencies among random variables, provided each variable depends directly on only a few others. We can represent these dependencies with a graph like the ones we saw for Markov chains and HMMs, except now the graph can have any structure (as long as the arrows don’t form closed loops).

46%

A Markov network is a set of features and corresponding weights, which together define a probability distribution.

50%

In fact, no learner is immune to the curse of dimensionality. It’s the second worst problem in machine learning, after overfitting. The term curse of dimensionality was coined by Richard Bellman, a control theorist, in the fifties. He observed that control algorithms that worked fine in three dimensions became hopelessly inefficient in higher-dimensional spaces, such as when you want to control every joint in a robot arm or every knob in a chemical plant. But in machine learning the problem is more than just computational cost—it’s that learning itself becomes harder and harder as the ...more

57%

This direction—known as the first principal component of the data—is also the direction along which the spread of the data is greatest.

57%

Principal-component analysis (PCA), as this process is known, is one of the key tools in the scientist’s toolkit.

57%

Applying PCA to congressional votes and poll data shows that, contrary to popular belief, politics is not mainly about liberals versus conservatives. Rather, people differ along two main dimensions: one for economic issues and one for social ones. Collapsing these into a single axis mixes together populists and libertarians, who are polar opposites, and creates the illusion of lots of moderates in the middle. Trying to appeal to them is an unlikely winning strategy. On the other hand, if liberals and libertarians overcame their mutual aversion, they could ally themselves on social issues, ...more

57%

One of the most popular algorithms for nonlinear dimensionality reduction, called Isomap, does just this. It connects each data point in a high-dimensional space (a face, say) to all nearby points (very similar faces), computes the shortest distances between all pairs of points along the resulting network and finds the reduced coordinates that best approximate these distances.

58%

Edward Thorndike called this the law of effect: actions that lead to pleasure are more likely to be repeated in the future; actions that lead to pain, less so.

65%

P is a probability, w is a vector of weights (notice it’s in boldface), n is a vector of numbers, and their dot product • is exponentiated and divided by Z, the sum of all products. If we let the first component of n be one if the first feature of the image is true and zero otherwise, and so on, w•n is just a shorthand for the weighted sum of features we’ve been talking about all along.

75%

People worry that computers will get too smart and take over the world, but the real problem is that they’re too stupid and they’ve already taken over the world.

77%

these, the closest in content to this book is, not coincidentally, the one I teach (www.coursera.org/course/machlearning). Two other options are Andrew Ng’s course (www.coursera.org/course/ml) and Yaser Abu-Mostafa’s (http://work.caltech.edu/telecourse.html). The next step is to read a textbook. The closest to this book, and one of the most accessible, is Tom Mitchell’s Machine Learning* (McGraw-Hill, 1997). More up-to-date, but also more mathematical, are Kevin Murphy’s Machine Learning: A Probabilistic Perspective* (MIT Press, 2012), Chris Bishop’s Pattern Recognition and Machine Learning* ...more

This highlight has been truncated due to consecutive passage length restrictions.

resources

77%

A current one is Artificial Intelligence: A Modern Approach, by Stuart Russell and Peter Norvig (3rd ed., Prentice Hall, 2010).