More on this book
Community
Kindle Notes & Highlights
Read between
November 19, 2023 - January 11, 2024
class of mathematical problems known as “optimal stopping” problems. The 37% rule defines a simple series of steps—what computer scientists call an “algorithm”—for solving these problems.
In this book, we explore the idea of human algorithm design—searching for better solutions to the challenges people encounter every day.
Optimal stopping tells us when to look and when to leap. The explore/exploit tradeoff tells us how to find the balance between trying new things and enjoying our favorites. Sorting theory tells us how (and whether) to arrange our offices. Caching theory tells us how to fill our closets. Scheduling theory tells us how to fill our time.
As Carl Sagan put it, “Science is a way of thinking much more than it is a body of knowledge.”
tackling real-world tasks requires being comfortable with chance, trading off time with accuracy, and using approximations.
Don’t always consider all your options. Don’t necessarily go for the outcome that seems best every time. Make a mess on occasion. Travel light. Let things wait. Trust your instincts and don’t think too long. Relax. Toss a coin. Forgive, but don’t forget. To thine own self be true.
If you prefer Mr. Martin to every other person; if you think him the most agreeable man you have ever been in company with, why should you hesitate? —JANE AUSTEN, EMMA
The nature of serial monogamy, writ large, is that its practitioners are confronted with a fundamental, unavoidable problem. When have you met enough people to know who your best match is? And what if acquiring the data costs you that very match?
The 37% Rule* derives from optimal stopping’s most famous puzzle, which has come to be known as the “secretary problem.”
the Look-Then-Leap Rule: You set a predetermined amount of time for “looking”—that is, exploring your options, gathering data—in which you categorically don’t choose anyone, no matter how impressive. After that point, you enter the “leap” phase, prepared to instantly commit to anyone who outshines the best applicant you saw in the look phase.
As the applicant pool grows, the exact place to draw the line between looking and leaping settles to 37% of the pool, yielding the 37% Rule: look at the first 37% of the applicants,* choosing none, then be ready to leap for anyone better than all those you’ve seen so far.
The passion between the sexes has appeared in every age to be so nearly the same that it may always be considered, in algebraic language, as a given quantity. —THOMAS MALTHUS
the Threshold Rule, where we immediately accept an applicant if they are above a certain percentile.
If you have all the facts, you can succeed more often than not, even as the applicant pool grows arbitrarily large.
Gold digging is more likely to succeed than a quest for love.
Any yardstick that provides full information on where an applicant stands relative to the population at large will change the solution from the Look-Then-Leap Rule to the Threshold Rule and will dramatically boost your chances of finding the single best applicant in the group.
I expect to pass through this world but once. Any good therefore that I can do, or any kindness that I can show to any fellow creature, let me do it now. Let me not defer or neglect it, for I shall not pass this way again. —STEPHEN GRELLET
Intuitively, we think that rational decision-making means exhaustively enumerating our options, weighing each one carefully, and then selecting the best. But in practice, when the clock—or the ticker—is ticking, few aspects of decision-making (or of thinking more generally) are as important as one: when to stop.
exploration is gathering information, and exploitation is using the information you have to get a known good result.
In computer science, the tension between exploration and exploitation takes its most concrete form in a scenario called the “multi-armed bandit problem.”
the explore/exploit tradeoff isn’t just a way to improve decisions about where to eat or what to listen to. It also provides fundamental insights into how our goals should change as we age, and why the most rational course of action isn’t always trying to choose the best.
When balancing favorite experiences and new ones, nothing matters as much as the interval over which we plan to enjoy them.
A sobering property of trying new things is that the value of exploration, of finding a new favorite, can only go down over time, as the remaining opportunities to savor it dwindle.
the value of exploitation can only go up over time.
Robbins specifically considered the case where there are exactly two slot machines, and proposed a solution called the Win-Stay, Lose-Shift algorithm: choose an arm at random, and keep pulling it as long as it keeps paying off. If the arm doesn’t pay off after a particular pull, then switch to the other one.
Robbins proved in 1952 that it performs reliably better than chance.
Win-Stay, Lose-Shift doesn’t have any notion of the interval over which you are optimizing.
Economists refer to this idea, of valuing the present more highly than the future, as “discounting.”
there is some guaranteed payout rate which, if offered to us in lieu of that machine, will make us quite content never to pull its handle again. This number—which Gittins called the “dynamic allocation index,” and which the world now knows as the Gittins index—suggests an obvious strategy on the casino floor: always play the arm with the highest index.*
once the Gittins index for a particular set of assumptions is known, it can be used for any problem of that form.
Gittins index values as a function of wins and losses, assuming that a payoff next time is worth 90% of a payoff now.
something you have no experience with whatsoever is more attractive than a machine that you know pays out 70% of the time!
The Gittins index, then, provides a formal, rigorous justification for preferring the unknown, provided we have some opportunity to exploit the results of what we learn from exploring.
Exploration in itself has value, since trying new things increases our chances of finding the best. So taking the future into account, rather than focusing just on the present, drives us toward novelty.
focus on regret.
In the memorable words of management theorist Chester Barnard, “To try and fail is at least to learn; to fail to try is to suffer the inestimable loss of what might have been.”
a “regret minimization framework.”
your total amount of regret will probably never stop increasing, even if you pick the best possible strategy—because even the best strategy isn’t perfect every time.
regret will increase at a slower rate if you pick the best strategy than if you pick others; what’s more, with a good strategy regret’s rate of growth will go down over time, as you learn more about the problem and are able to make better choices.
the minimum possible regret—again assuming non-omniscience—is regret that increases at a logarithmic rate...
This highlight has been truncated due to consecutive passage length restrictions.
Logarithmically increasing regret means that we’ll make as many mistakes in our first ten pulls as in the following ninety, and as many in our first y...
This highlight has been truncated due to consecutive passage length restrictions.
if we’re following a regret-minimizing algorithm, every year we can expect to have fewer new regrets than we did the year before.
algorithms that offer the guarantee of minimal regret. Of the ones they’ve discovered, the most popular are known as Upper Confidence Bound algorithms.
Visual displays of statistics often include so-called error bars that extend above and below any data point, indicating uncertainty in the measurement; the error bars show the range of plausible values that the quantity being measured could actually have. This range is known as the “confidence interval,” and as we gain more data about something the confidence interval will shrink, reflecting an increasingly accurate assessment.
In a multi-armed bandit problem, an Upper Confidence Bound algorithm says, quite simply, to pick the option for which the top of the confidence interval is highest.
Upper Confidence Bound algorithms assign a single number to each arm of the multi-armed bandit. And that number is set to the highest value that the arm could reasonably have, based on the information available so far. So an Upper Confidence Bound algorithm doesn’t care which arm has performed best so far; instead, it chooses the arm that could reasonably perform best in the future.
the Upper Confidence Bound is always greater than the expected value, but by less and less as we gain more experience with a particular option.
Upper Confidence Bound algorithms implement a principle that has been dubbed “optimism in the face of uncertainty.”
they naturally inject a dose of exploration into the decision-making process, leaping at new options with enthusiasm because any one of them could be the next big thing.
In the long run, optimism is the best prevention for regret.