Algorithms to Live By: The Computer Science of Human Decisions
Rate it:
Open Preview
Kindle Notes & Highlights
28%
Flag icon
Computers themselves do something like this: they wait until some fixed interval and check everything, instead of context-switching to handle separate, uncoordinated interrupts from their various subcomponents.
28%
Flag icon
In academia, holding office hours is a way of coalescing interruptions from students. And in the private sector, interrupt coalescing offers a redemptive view of one of the most maligned office rituals: the weekly meeting.
28%
Flag icon
“This is what computer scientists call batch processing—the alternative is swapping in and out. I don’t swap in and out.”
28%
Flag icon
“Email is a wonderful thing for people whose role in life is to be on top of things. But not for me; my role is to be on the bottom of things. What I do takes long hours of studying and uninterruptible concentration.” He reviews all his postal mail every three months, and all his faxes every six.
29%
Flag icon
Our days are full of “small data.” In fact, like Gott standing at the Berlin Wall, we often have to make an inference from the smallest amount of data we could possibly have: a single observation.
29%
Flag icon
If we be, therefore, engaged by arguments to put trust in past experience, and make it the standard of our future judgement, these arguments must be probable only.
29%
Flag icon
the question of making predictions from small data weighed heavily on the mind of the Reverend Thomas Bayes, a Presbyterian minister in the charming spa town of Tunbridge Wells, England.
29%
Flag icon
If we buy ten tickets for a new and unfamiliar raffle, Bayes imagined, and five of them win prizes, then it seems relatively easy to estimate the raffle’s chances of a win: 5/10, or 50%. But what if instead we buy a single ticket and it wins a prize? Do we really imagine the probability of winning to be 1/1, or 100%? That seems too optimistic. Is it? And if so, by how much? What should we actually guess?
Yuan
What a beautiful example?!
29%
Flag icon
Bayes’s critical insight was that trying to use the winning and losing tickets we see to figure out the overall ticket pool that they came from is essentially reasoning backward. And to do that, he argued, we need to first reason forward from hypotheticals. In other words, we need to first determine how probable it is that we would have drawn the tickets we did if various scenarios were true. This probability—known to modern statisticians as the “likelihood”—gives us the information we need to solve the problem.
Yuan
Hindsight
29%
Flag icon
This is the crux of Bayes’s argument. Reasoning forward from hypothetical pasts lays the foundation for us to then work backward to the most probable one.
29%
Flag icon
showed, then after drawing a winning ticket on our first try we should expect that the proportion of winning tickets in the whole pool is exactly 2/3.
29%
Flag icon
5. In fact, for any possible drawing of w winning tickets in n attempts, the expectation is simply the number of wins plus one, divided by the number of attempts plus two: (w+1)⁄(n+2).
Yuan
Assume n=2w, then prob = w+1 / 2w + 2 = 0.5
29%
Flag icon
This incredibly simple scheme for estimating probabilities is known as Laplace’s Law,
30%
Flag icon
He also wrote the Philosophical Essay on Probabilities, arguably the first book about probability for a general audience and still one of the best, laying out his theory and considering its applications to law, the sciences, and everyday life.
30%
Flag icon
as the real heavy lifting was done by Laplace—as Bayes’s Rule. And it gives a remarkably straightforward solution to the problem of how to combine preexisting beliefs with observed evidence: multiply their probabilities together.
30%
Flag icon
(You can’t multiply the two probabilities together when you don’t have one of them.)
30%
Flag icon
And Bayes’s Rule always needs some prior from you,
30%
Flag icon
The fact that Bayes’s Rule is dependent on the use of priors has at certain points in history been considered controversial, biased, even unscientific. But in reality, it is quite rare to go into a situation so totally unfamiliar that our mind is effectively a blank slate—a point we’ll return to momentarily.
30%
Flag icon
And it turns out that the Copernican Principle is exactly what results from applying Bayes’s Rule using what is known as an uninformative prior.
31%
Flag icon
better. The richer the prior information we bring to Bayes’s Rule, the more useful the predictions we can get out of it.
31%
Flag icon
Real-World Priors …
Yuan
Base rate
31%
Flag icon
This kind of pattern typifies what are called “power-law distributions.”
31%
Flag icon
The power-law distribution characterizes a host of phenomena in everyday life that have the same basic quality as town populations: most things below the mean, and a few enormous ones above it.
31%
Flag icon
In fact, money in general is a domain full of power laws. Power-law distributions characterize both people’s wealth and people’s incomes. The
31%
Flag icon
It’s often lamented that “the rich get richer,” and indeed the process of “preferential attachment” is one of the surest ways to produce a power-law distribution.
31%
Flag icon
Examining the Copernican Principle, we saw that when Bayes’s Rule is given an uninformative prior, it always predicts that the total life span of an object will be exactly double its current age.
31%
Flag icon
Bayes’s Rule indicates that the appropriate prediction strategy is a Multiplicative Rule: multiply the quantity observed so far by some constant factor.
31%
Flag icon
possible that
31%
Flag icon
Instead of a multiplicative rule, we get an Average Rule: use the distribution’s “natural” average—
31%
Flag icon
Something normally distributed that’s gone on seemingly too long is bound to end shortly; but the longer something in a power-law distribution has gone on, the longer you can expect it to keep going.
Yuan
Prior distribution assumption might give you a hint about your prediction. Normal or scalefree(power law) ?
32%
Flag icon
The Danish mathematician Agner Krarup Erlang, who studied such phenomena, formalized the spread of intervals between independent events into the function that now carries his name: the Erlang distribution.
32%
Flag icon
Since then, the Erlang distribution has also been used by urban planners and architects to model car and pedestrian traffic, and by networking engineers designing infrastructure for the Internet.
32%
Flag icon
between them thus fall on an Erlang curve. Radioactive decay is one example, which means that the Erlang distribution perfectly models when to expect the next ticks of a Geiger counter.
32%
Flag icon
The Erlang distribution gives us a third kind of prediction rule, the Additive Rule:
32%
Flag icon
in fact, his prediction is entirely correct. Indeed, distributions that yield the same prediction, no matter their history or current state,
32%
Flag icon
“memoryless.”
32%
Flag icon
32%
Flag icon
These three very different patterns of optimal prediction—the Multiplicative, Average, and Additive Rules—all result directly from applying Bayes’s Rule to the power-law, normal, and Erlang distributions, respectively.
32%
Flag icon
In a power-law distribution, the longer something has gone on, the longer we expect it to continue going on. So a power-law event is more surprising the longer we’ve been waiting for it—and maximally surprising right before it happens. A
32%
Flag icon
In a normal distribution, events are surprising when they’re early—since we expected them to reach the average—but not when they’re late. Indeed, by that point they seem overdue to happen, so the longer we wait, the more we expect them.
32%
Flag icon
And in an Erlang distribution, events by definition are never any more or less surprising no matter when they occur.
32%
Flag icon
“Know when to walk away / Know when to run”—but for a memoryless distribution, there is no right time to quit. This may in part explain these games’ addictiveness.
32%
Flag icon
Knowing what distribution you’re up against can make all the difference.
32%
Flag icon
But that one statistic—eight months—didn’t tell him anything about the distribution of survivors. If it were a normal distribution, then the Average Rule would give a pretty clear forecast of how long he could expect to live: about eight months.
Yuan
Prior belief matters in this case.
32%
Flag icon
The three prediction rules—Multiplicative, Average, and Additive—are applicable in a wide range of everyday situations.
32%
Flag icon
The reason we can often make good predictions from a small number of observations—or just a single one—is that our priors are so rich.
32%
Flag icon
Over the past decade, approaches like these have enabled cognitive scientists to identify people’s prior distributions across a broad swath of domains, from vision to language.
32%
Flag icon
People simply didn’t have enough everyday exposure to have an intuitive feel for the range of those values, so their predictions, of course, faltered.
32%
Flag icon
Good predictions require good priors.
33%
Flag icon
if the amount of time it takes for adults to come back is governed by a power-law distribution—with long absences suggesting even longer waits lie ahead—then cutting one’s losses at some point can make perfect sense.