More on this book
Community
Kindle Notes & Highlights
By entering an almost purely exploit-focused phase, the film industry seems to be signaling a belief that it is near the end of its interval.
Indeed, as Peter Whittle recounts, during World War II efforts to solve the question “so sapped the energies and minds of Allied analysts that the suggestion was made that the problem be dropped over Germany, as the ultimate instrument of intellectual sabotage.”
Robbins specifically considered the case where there are exactly two slot machines, and proposed a solution called the Win-Stay, Lose-Shift algorithm: choose an arm at random, and keep pulling it as long as it keeps paying off. If the arm doesn’t pay off after a particular pull, then switch to the other one.
More significantly, Win-Stay, Lose-Shift doesn’t have any notion of the interval over which you are optimizing.
As with the full-information secretary problem, Bellman’s trick was essentially to work backward, starting by imagining the final pull and considering which slot machine to choose given all the possible outcomes of the previous decisions.
As so often happens in mathematics, though, the particular is the gateway to the universal.
Gittins tried to cast the problem in the most general form he could: multiple options to pursue, a different probability of reward for each option, and a certain amount of effort (or money, or time) to be allocated among them. It was, of course, another incarnation of the multi-armed bandit problem.
Economists refer to this idea, of valuing the present more highly than the future, as “discounting.”
Gittins, for his part, made the assumption that the value assigned to payoffs decreases geometrically: that is, each restaurant visit you make is worth a constant fraction of the last one.
For every slot machine we know little or nothing about, there is some guaranteed payout rate which, if offered to us in lieu of that machine, will make us quite content never to pull its handle again.
The old adage tells us that “the grass is always greener on the other side of the fence,” but the math tells us why: the unknown has a chance of being better, even if we actually expect it to be no different, or if it’s just as likely to be worse.
Exploration in itself has value, since trying new things increases our chances of finding the best. So taking the future into account, rather than focusing just on the present, drives us toward novelty.
It’s based on geometric discounting of future reward, valuing each pull at a constant fraction of the previous one, which is something that a variety of experiments in behavioral economics and psychology suggest people don’t do.
These strategies are easier for humans (and machines) to apply in a range of situations than crunching the optimal Gittins index, while still providing comparably good performance.
In the memorable words of management theorist Chester Barnard, “To try and fail is at least to learn; to fail to try is to suffer the inestimable loss of what might have been.”
Before he decided to start Amazon.com, Jeff Bezos had a secure and well-paid position at the investment company D. E. Shaw & Co. in New York.
Says Bezos: The framework I found, which made the decision incredibly easy, was what I called—which only a nerd would call—a “regret minimization framework.”
I knew that when I was 80 I was not going to regret having tried this.
Regret is the result of comparing what we actually did with what would have been best in hindsight. In a multi-armed bandit, Barnard’s “inestimable loss” can in fact be measured precisely, and regret assigned a number: it’s the difference between the total payoff obtained by following a particular strategy and the total payoff that theoretically could have been obtained by just pulling the best arm every single time (had we only known from the start which one it was).
Logarithmically increasing regret means that we’ll make as many mistakes in our first ten pulls as in the following ninety, and as many in our first year as in the rest of the decade combined.
In a multi-armed bandit problem, an Upper Confidence Bound algorithm says, quite simply, to pick the option for which the top of the confidence interval is highest.
The recommendations given by Upper Confidence Bound algorithms will be similar to those provided by the Gittins index, but they are significantly easier to compute, and they don’t require the assumption of geometric discounting. Upper Confidence Bound algorithms implement a principle that has been dubbed “optimism in the face of uncertainty.”
As a consequence, they naturally inject a dose of exploration into the decision-making process, leaping at new options with enthusiasm because any one of them could be the next big thing.
After a period of time, if statistically significant effects are observed, the “winning” version is typically locked into place—or becomes the control for another round of experiments.
And in all cases, to the astonishment of the campaign team, a simple black-and-white photo of the Obama family outperformed any other photo or video the team could come up with.
Big tech firms such as Amazon and Google began carrying out live A/B tests on their users starting in about 2000, and over the following years the Internet has become the world’s largest controlled experiment.
Instead of “the” Google search algorithm and “the” Amazon checkout flow, there are now untold and unfathomably subtle permutations.
Data scientist Jeff Hammerbacher, former manager of the Data group at Facebook, once told Bloomberg Businessweek that “the best minds of my generation are thinking about how to make people click ads.” Consider it the millennials’ Howl—what Allen Ginsberg’s immortal “I saw the best minds of my generation destroyed by madness” was to the Beat Generation.
After the inauguration, Siroker returned west to California, and with fellow Googler Pete Koomen co-founded the website optimization firm Optimizely.
The best algorithms to use remain hotly contested, with rival statisticians, engineers, and bloggers endlessly sparring about the optimal way to balance exploration and exploitation in every possible business scenario.
However, even avoiding harm requires learning what is harmful; and, in the process of obtaining this information, persons may be exposed to risk of harm.”
It also makes it clear that gathering knowledge can be so valuable that some aspects of normal medical ethics can be suspended.
A difficult ethical problem remains, for example, about research [on childhood diseases] that presents more than minimal risk without immediate prospect of direct benefit to the children involved.
But doctors, like tech companies, are gaining some information about which option is better while the trial proceeds—information that could be used to improve outcomes not only for future patients beyond the trial, but for the patients currently in it.
One of the ideas he suggested was a randomized “play the winner” algorithm—a version of Win-Stay, Lose-Shift, in which the chance of using a given treatment is increased by each win and decreased by each loss.
But in its early days the ECMO technology and procedure were considered highly experimental, and early studies in adults showed no benefit compared to conventional treatments.
But consider that part of what the advent of statistics did for medicine, at the start of the twentieth century, was to transform it from a field in which doctors had to persuade each other in ad hoc ways about every new treatment into one where they had clear guidelines about what sorts of evidence were and were not persuasive. Changes to accepted standard statistical practice have the potential to upset this balance, at least temporarily.
In 2010 and 2015, the FDA released a pair of draft “guidance” documents on “Adaptive Design” clinical trials for drugs and medical devices, which suggests—despite a long history of sticking to an option they trust—that they might at last be willing to explore alternatives.
In general, it seems that people tend to over-explore—to favor the new disproportionately over the best.
The standard multi-armed bandit problem assumes that the probabilities with which the arms pay off are fixed over time. But that’s not necessarily true of airlines, restaurants, or other contexts in which people have to make repeated choices.
In his celebrated essay “Walking,” Henry David Thoreau reflected on how he preferred to do his traveling close to home, how he never tired of his surroundings and always found something new or surprising in the Massachusetts landscape. “There is in fact a sort of harmony discoverable between the capabilities of the landscape within a circle of ten miles’ radius, or the limits of an afternoon walk, and the threescore years and ten of human life,” he wrote. “It will never become quite familiar to you.” To live in a restless world requires a certain restlessness in oneself. So long as things
...more
Having instincts tuned by evolution for a world in constant flux isn’t necessarily helpful in an era of industrial standardization.
When we talk about decision-making, we usually focus just on the immediate payoff of a single decision—and if you treat every decision as if it were your last, then indeed only exploitation makes sense.
And it’s actually rational to emphasize exploration—the new rather than the best, the exciting rather than the safe, the random rather than the considered—for many of those choices, particularly earlier in life.
The traditional explanation for the elderly having smaller social networks is that it’s just one example of the decrease in quality of life that comes with aging—the result of diminished ability to contribute to social relationships, greater fragility, and general disengagement from society.
What Carstensen and her colleagues found is that the shrinking of social networks with aging is due primarily to “pruning” peripheral relationships and focusing attention instead on a core of close friends and family members.
Perhaps the deepest insight that comes from thinking about later life as a chance to exploit knowledge acquired over decades is this: life should get better over time. What an explorer trades off for knowledge is pleasure. The Gittins index and the Upper Confidence Bound, as we’ve seen, inflate the appeal of lesser-known options beyond what we actually expect, since pleasant surprises can pay off many times over.
“Socks confound me!” confessed legendary cryptographer and Turing Award–winning computer scientist Ron Rivest to the two of us when we brought up the topic. He was wearing sandals at the time.
The tabulation of the 1880 census took eight years—just barely finishing by the time the 1890 census began.
Inspired by the punched railway tickets of the time, an inventor by the name of Herman Hollerith devised a system of punched manila cards to store information, and a machine, which he called the Hollerith Machine, to count and sort them.