Algorithms to Live By: The Computer Science of Human Decisions
Rate it:
Open Preview
7%
Flag icon
By entering an almost purely exploit-focused phase, the film industry seems to be signaling a belief that it is near the end of its interval.
8%
Flag icon
Indeed, as Peter Whittle recounts, during World War II efforts to solve the question “so sapped the energies and minds of Allied analysts that the suggestion was made that the problem be dropped over Germany, as the ultimate instrument of intellectual sabotage.”
8%
Flag icon
Robbins specifically considered the case where there are exactly two slot machines, and proposed a solution called the Win-Stay, Lose-Shift algorithm: choose an arm at random, and keep pulling it as long as it keeps paying off. If the arm doesn’t pay off after a particular pull, then switch to the other one.
8%
Flag icon
More significantly, Win-Stay, Lose-Shift doesn’t have any notion of the interval over which you are optimizing.
8%
Flag icon
As with the full-information secretary problem, Bellman’s trick was essentially to work backward, starting by imagining the final pull and considering which slot machine to choose given all the possible outcomes of the previous decisions.
8%
Flag icon
As so often happens in mathematics, though, the particular is the gateway to the universal.
8%
Flag icon
Gittins tried to cast the problem in the most general form he could: multiple options to pursue, a different probability of reward for each option, and a certain amount of effort (or money, or time) to be allocated among them. It was, of course, another incarnation of the multi-armed bandit problem.
8%
Flag icon
Economists refer to this idea, of valuing the present more highly than the future, as “discounting.”
8%
Flag icon
Gittins, for his part, made the assumption that the value assigned to payoffs decreases geometrically: that is, each restaurant visit you make is worth a constant fraction of the last one.
8%
Flag icon
For every slot machine we know little or nothing about, there is some guaranteed payout rate which, if offered to us in lieu of that machine, will make us quite content never to pull its handle again.
9%
Flag icon
The old adage tells us that “the grass is always greener on the other side of the fence,” but the math tells us why: the unknown has a chance of being better, even if we actually expect it to be no different, or if it’s just as likely to be worse.
9%
Flag icon
Exploration in itself has value, since trying new things increases our chances of finding the best. So taking the future into account, rather than focusing just on the present, drives us toward novelty.
9%
Flag icon
It’s based on geometric discounting of future reward, valuing each pull at a constant fraction of the previous one, which is something that a variety of experiments in behavioral economics and psychology suggest people don’t do.
9%
Flag icon
These strategies are easier for humans (and machines) to apply in a range of situations than crunching the optimal Gittins index, while still providing comparably good performance.
9%
Flag icon
In the memorable words of management theorist Chester Barnard, “To try and fail is at least to learn; to fail to try is to suffer the inestimable loss of what might have been.”
9%
Flag icon
Before he decided to start Amazon.com, Jeff Bezos had a secure and well-paid position at the investment company D. E. Shaw & Co. in New York.
9%
Flag icon
Says Bezos: The framework I found, which made the decision incredibly easy, was what I called—which only a nerd would call—a “regret minimization framework.”
9%
Flag icon
I knew that when I was 80 I was not going to regret having tried this.
9%
Flag icon
Regret is the result of comparing what we actually did with what would have been best in hindsight. In a multi-armed bandit, Barnard’s “inestimable loss” can in fact be measured precisely, and regret assigned a number: it’s the difference between the total payoff obtained by following a particular strategy and the total payoff that theoretically could have been obtained by just pulling the best arm every single time (had we only known from the start which one it was).
9%
Flag icon
Logarithmically increasing regret means that we’ll make as many mistakes in our first ten pulls as in the following ninety, and as many in our first year as in the rest of the decade combined.
9%
Flag icon
In a multi-armed bandit problem, an Upper Confidence Bound algorithm says, quite simply, to pick the option for which the top of the confidence interval is highest.
9%
Flag icon
The recommendations given by Upper Confidence Bound algorithms will be similar to those provided by the Gittins index, but they are significantly easier to compute, and they don’t require the assumption of geometric discounting. Upper Confidence Bound algorithms implement a principle that has been dubbed “optimism in the face of uncertainty.”
10%
Flag icon
As a consequence, they naturally inject a dose of exploration into the decision-making process, leaping at new options with enthusiasm because any one of them could be the next big thing.
10%
Flag icon
After a period of time, if statistically significant effects are observed, the “winning” version is typically locked into place—or becomes the control for another round of experiments.
10%
Flag icon
And in all cases, to the astonishment of the campaign team, a simple black-and-white photo of the Obama family outperformed any other photo or video the team could come up with.
10%
Flag icon
Big tech firms such as Amazon and Google began carrying out live A/B tests on their users starting in about 2000, and over the following years the Internet has become the world’s largest controlled experiment.
10%
Flag icon
Instead of “the” Google search algorithm and “the” Amazon checkout flow, there are now untold and unfathomably subtle permutations.
10%
Flag icon
Data scientist Jeff Hammerbacher, former manager of the Data group at Facebook, once told Bloomberg Businessweek that “the best minds of my generation are thinking about how to make people click ads.” Consider it the millennials’ Howl—what Allen Ginsberg’s immortal “I saw the best minds of my generation destroyed by madness” was to the Beat Generation.
10%
Flag icon
After the inauguration, Siroker returned west to California, and with fellow Googler Pete Koomen co-founded the website optimization firm Optimizely.
10%
Flag icon
The best algorithms to use remain hotly contested, with rival statisticians, engineers, and bloggers endlessly sparring about the optimal way to balance exploration and exploitation in every possible business scenario.
10%
Flag icon
However, even avoiding harm requires learning what is harmful; and, in the process of obtaining this information, persons may be exposed to risk of harm.”
10%
Flag icon
It also makes it clear that gathering knowledge can be so valuable that some aspects of normal medical ethics can be suspended.
10%
Flag icon
A difficult ethical problem remains, for example, about research [on childhood diseases] that presents more than minimal risk without immediate prospect of direct benefit to the children involved.
10%
Flag icon
But doctors, like tech companies, are gaining some information about which option is better while the trial proceeds—information that could be used to improve outcomes not only for future patients beyond the trial, but for the patients currently in it.
11%
Flag icon
One of the ideas he suggested was a randomized “play the winner” algorithm—a version of Win-Stay, Lose-Shift, in which the chance of using a given treatment is increased by each win and decreased by each loss.
11%
Flag icon
But in its early days the ECMO technology and procedure were considered highly experimental, and early studies in adults showed no benefit compared to conventional treatments.
11%
Flag icon
But consider that part of what the advent of statistics did for medicine, at the start of the twentieth century, was to transform it from a field in which doctors had to persuade each other in ad hoc ways about every new treatment into one where they had clear guidelines about what sorts of evidence were and were not persuasive. Changes to accepted standard statistical practice have the potential to upset this balance, at least temporarily.
11%
Flag icon
In 2010 and 2015, the FDA released a pair of draft “guidance” documents on “Adaptive Design” clinical trials for drugs and medical devices, which suggests—despite a long history of sticking to an option they trust—that they might at last be willing to explore alternatives.
11%
Flag icon
In general, it seems that people tend to over-explore—to favor the new disproportionately over the best.
12%
Flag icon
The standard multi-armed bandit problem assumes that the probabilities with which the arms pay off are fixed over time. But that’s not necessarily true of airlines, restaurants, or other contexts in which people have to make repeated choices.
12%
Flag icon
In his celebrated essay “Walking,” Henry David Thoreau reflected on how he preferred to do his traveling close to home, how he never tired of his surroundings and always found something new or surprising in the Massachusetts landscape. “There is in fact a sort of harmony discoverable between the capabilities of the landscape within a circle of ten miles’ radius, or the limits of an afternoon walk, and the threescore years and ten of human life,” he wrote. “It will never become quite familiar to you.” To live in a restless world requires a certain restlessness in oneself. So long as things ...more
12%
Flag icon
Having instincts tuned by evolution for a world in constant flux isn’t necessarily helpful in an era of industrial standardization.
12%
Flag icon
When we talk about decision-making, we usually focus just on the immediate payoff of a single decision—and if you treat every decision as if it were your last, then indeed only exploitation makes sense.
12%
Flag icon
And it’s actually rational to emphasize exploration—the new rather than the best, the exciting rather than the safe, the random rather than the considered—for many of those choices, particularly earlier in life.
12%
Flag icon
The traditional explanation for the elderly having smaller social networks is that it’s just one example of the decrease in quality of life that comes with aging—the result of diminished ability to contribute to social relationships, greater fragility, and general disengagement from society.
12%
Flag icon
What Carstensen and her colleagues found is that the shrinking of social networks with aging is due primarily to “pruning” peripheral relationships and focusing attention instead on a core of close friends and family members.
12%
Flag icon
Perhaps the deepest insight that comes from thinking about later life as a chance to exploit knowledge acquired over decades is this: life should get better over time. What an explorer trades off for knowledge is pleasure. The Gittins index and the Upper Confidence Bound, as we’ve seen, inflate the appeal of lesser-known options beyond what we actually expect, since pleasant surprises can pay off many times over.
13%
Flag icon
“Socks confound me!” confessed legendary cryptographer and Turing Award–winning computer scientist Ron Rivest to the two of us when we brought up the topic. He was wearing sandals at the time.
13%
Flag icon
The tabulation of the 1880 census took eight years—just barely finishing by the time the 1890 census began.
13%
Flag icon
Inspired by the punched railway tickets of the time, an inventor by the name of Herman Hollerith devised a system of punched manila cards to store information, and a machine, which he called the Hollerith Machine, to count and sort them.