More on this book
Community
Kindle Notes & Highlights
Read between
July 18 - November 21, 2017
Science goes through three phases, which we can call the Brahe, Kepler, and Newton phases. In the Brahe phase, we gather lots of data, like Tycho Brahe patiently recording the positions of the planets night after night, year after year. In the Kepler phase, we fit empirical laws to the data, like Kepler did to the planets’ motions.
In the Newton phase, we discover the deeper truths.
Every transaction works on two levels: what it accomplishes for you and what it teaches the system you just interacted with.
In symbolist learning, there is a one-to-one correspondence between symbols and the concepts they represent. In contrast, connectionist representations are distributed: each concept is represented by many neurons, and each neuron participates in representing many different concepts.
In Hemingway’s The Sun Also Rises, when Mike Campbell is asked how he went bankrupt, he replies, “Two ways. Gradually and then suddenly.”
Children’s learning is not a steady improvement but an accumulation of S curves. So
Another big issue is that humans—and symbolic models like sets of rules and decision trees—can explain their reasoning, while neural networks are big piles of numbers that no one can understand.
One of the most important problems in machine learning—and life—is the exploration-exploitation dilemma. If you’ve found something that works, should you just keep doing it? Or is it better to try new things, knowing it could be a waste of time but also might lead to a better solution?
Goslings follow their mother around (evolved behavior) but that requires recognizing her (learned ability). If you’re the first thing they see when they hatch, they’ll follow you instead, as Konrad Lorenz memorably showed.
A learner that uses Bayes’ theorem and assumes the effects are independent given the cause is called a Naïve Bayes classifier.
It’s astonishing how simultaneously very wrong and very useful some models can be.
Surprising accuracy aside, it scales great; learning a Naïve Bayes classifier is just a matter of counting how many times each attribute co-occurs with each class and takes barely longer than reading the data from disk.