Daniel Dantas’s Kindle Notes & Highlights for Algorithms to Live By: The Computer Science of Human Decisions

Rate it:

Open Preview

More on this book

Community

Casey E.

5 notes & 13 highlights

Daryl Ducharme

44 notes & 64 highlights

Balint Erdi

1 note & 210 highlights

Queenie Nguyen

6 notes & 11 highlights

Mike Massie

6 notes & 33 highlights

Caitlin Wilson

1 note & 354 highlights

Yuan

14 notes & 249 highlights

Rudie Rai

1 note & 2 highlights

Rob

4 notes & 31 highlights

Travis Chambers

2 notes & 32 highlights

Jesse Gardner

Megan

Athul Jacob

Pallavi

Shashikant

Nicktzr

HARJOT S KAHLON

Janie Tran

Gray Spencer

Primož Centa

Kevin Oliveros

Annie

Vince Snow

Shubha

Gregory

Vano Maisuradze

Kristjan

Matej Navara

Adam

Brian

Steve Fenton

Faried Nawaz

Juan Luis Cordero

Sergey

Angel Lemke

Luis Conejo-Alpizar

EK McClelion

Attila Bertók

Adil Asad

Wildan Zulfikar

Kindle Notes & Highlights

by Daniel Dantas

See all Daniel’s Notes & Highlights

Algorithms to Live By: The Computer Science of Human Decisions

by Brian Christian

34%

That is, it’s an attempt to formulate a theory that will account for the experiences you’ve had to date and say something about the future ones you’re guessing at.

34%

The lesson is this: it is indeed true that including more factors in a model will always, by definition, make it a better fit for the data we have already. But a better fit for the available data does not necessarily mean a better prediction.

34%

Granted, a model that’s too simple—for instance, the straight line of the one-factor formula—can fail to capture the essential pattern in the data.

34%

On the other hand, a model that’s too complicated, such as our nine-factor model here, becomes oversensitive to the particular data points that we happened to observe.

34%

Fundamentally, overfitting is a kind of idolatry of data, a consequence of focusing on what we’ve been able to measure rather than what matters. This gap between the data we have and the predictions we want is virtually everywhere. When making a big decision, we can only guess at what will please us later by thinking about the factors important to us right now.

34%

Even in our small daily acts this pattern holds: writing an email, we use our own read-through of the text to predict that of the recipient.

34%

The answer is that taste is our body’s proxy metric for health.

34%

But being able to modify the foods available to us broke that relationship.

35%

In other words, we can overfit taste. And the more skillfully we can manipulate food (and the more our lifestyles diverge from those of our ancestors), the more imperfect a metric taste becomes.

35%

Overfitting the signals—adopting an extreme diet to lower body fat and taking steroids to build muscle, perhaps—can make you the picture of good health, but only the picture.

35%

It’s as exciting a sport as ever, but as athletes overfit their tactics to the quirks of scorekeeping, it becomes less useful in instilling the skills of real-world swordsmanship.

35%

In the military and in law enforcement, for example, repetitive, rote training is considered a key means for instilling line-of-fire skills. The goal is to drill certain motions and tactics to the point that they become totally automatic. But when overfitting creeps in, it can prove disastrous.

35%

Mistakes like these are known in law enforcement and the military as “training scars,” and they reflect the fact that it’s possible to overfit one’s own preparation.

35%

Simply put, Cross-Validation means assessing not only how well a model fits the data it’s given, but how well it generalizes to data it hasn’t seen. Paradoxically, this may involve using less data.

35%

Aside from withholding some of the available data points, it is also useful to consider testing the model with data derived from some other form of evaluation entirely.

35%

In these cases, we’ll need to cross-validate the primary performance measure we’re using against other possible measures.

35%

Alongside such tests, however, schools could randomly assess some small fraction of the students—one per class, say, or one in a hundred—using a different evaluation method, perhaps something like an essay or an oral exam. (Since only a few students would be tested this way, having this secondary method scale well is not a big concern.) The standardized tests would provide immediate feedback—you could have students take a short computerized exam every week and chart the class’s progress almost in real time, for instance—while the secondary data points would serve to cross-validate: to make ...more

35%

Just as essays and oral exams can cross-validate standardized tests, so occasional unfamiliar “cross-training” assessments might be used to measure whether reaction time and shooting accuracy are generalizing to unfamiliar tasks.

35%

If you can’t explain it simply, you don’t understand it well enough. —ANONYMOUS

35%

One way to choose among several competing models is the Occam’s razor principle, which suggests that, all things being equal, the simplest possible hypothesis is probably the correct one.

36%

If we introduce a complexity penalty, then more complex models need to do not merely a better job but a significantly better job of explaining the data to justify their greater complexity. Computer scientists refer to this principle—using constraints that penalize models for their complexity—as Regularization.

36%

One algorithm, discovered in 1996 by biostatistician Robert Tibshirani, is called the Lasso and uses as its penalty the total weight of the different factors in the model.* By putting this downward pressure on the weights of the factors, the Lasso drives as many of them as possible completely to zero. Only the factors that have a big impact on the results remain in the equation—thus potentially transforming, say, an overfitted nine-factor model into a simpler, more robust formula with just a couple of the most critical factors.

36%

Living organisms get a certain push toward simplicity almost automatically, thanks to the constraints of time, memory, energy, and attention.

36%

On the other hand, we can also infer that a substantially more complex brain probably didn’t provide sufficient dividends, evolutionarily speaking. We’re as brainy as we have needed to be, but not extravagantly more so.

36%

Actual, biological neural networks sidestep some of this problem because they need to trade off their performance against the costs of maintaining it. Neuroscientists have suggested, for instance, that brains try to minimize the number of neurons that are firing at any given moment—implementing the same kind of downward pressure on complexity as the Lasso. Language forms yet another natural Lasso: complexity is punished by the labor of speaking at greater length and the taxing of our listener’s attention span.

36%

And anything that needs to be remembered has to pass through the inherent Lasso of memory.

36%

I should have computed the historical covariances of the asset classes and drawn an efficient frontier. Instead, I visualized my grief if the stock market went way up and I wasn’t in it—or if it went way down and I was completely in it. My intention was to minimize my future regret. So I split my contributions fifty-fifty between bonds and equities.

36%

When it comes to portfolio management, it turns out that unless you’re highly confident in the information you have about the markets, you may actually be better off ignoring that information altogether. Applying Markowitz’s optimal portfolio allocation scheme requires having good estimates of the statistical properties of different investments.

36%

But when the odds of estimating them all correctly are low, and the weight that the model puts on those untrustworthy quantities is high, then an alarm should be going off in the decision-making process: it’s time to regularize.

36%

Imposing penalties on the ultimate complexity of a model is not the only way to alleviate overfitting, however. You can also nudge a model toward simplicity by controlling the speed with which you allow it to adapt to incoming data.

36%

As the New York Times put it, “coconut water seems to have jumped from invisible to unavoidable without a pause in the realm of the vaguely familiar.”

36%

The biggest purchaser of kale the year before had been Pizza Hut, which put it in their salad bars—as decoration.

36%

In contrast, if we look at the way organisms—including humans—evolve, we notice something intriguing: change happens slowly.

36%

Though crossed-over nerve fibers and repurposed jawbones may seem like suboptimal arrangements, we don’t necessarily want evolution to fully optimize an organism to every shift in its environmental niche—or, at least, we should recognize that doing so would make it extremely sensitive to further environmental changes.

37%

A bit of conservatism, a certain bias in favor of history, can buffer us against the boom-and-bust cycle of fads.

37%

In machine learning, the advantages of moving slowly emerge most concretely in a regularization technique known as Early Stopping.

37%

Their models can therefore be kept from becoming overly complex simply by stopping the process short, before overfitting has had a chance to creep in.

37%

Those extra hours, it turned out, had been spent nailing down nitty-gritty details that only confused the students, and wound up getting cut from the lectures the next time Tom taught the class. The underlying issue, Tom eventually realized, was that he’d been using his own taste and judgment as a kind of proxy metric for his students’. This proxy metric worked reasonably well as an approximation, but it wasn’t worth overfitting—which explained why spending extra hours painstakingly “perfecting” all the slides had been counterproductive.

37%

If the factors we come up with first are likely to be the most important ones, then beyond a certain point thinking more about a problem is not only going to be a waste of time and effort—it will lead us to worse solutions. Early Stopping provides the foundation for a reasoned argument against reasoning, the thinking person’s case against thought.

37%

The greater the uncertainty, the bigger the gap between what you can measure and what matters, the more you should watch out for overfitting—that is, the more you should prefer simplicity, and the earlier you should stop.

37%

To return to Darwin, his problem of deciding whether to propose could probably have been resolved based on just the first few pros and cons he identified, with the subsequent ones adding to the time and anxiety expended on the decision without necessarily aiding its resolution (and in all likelihood impeding it). What seemed to make up his mind was the thought that “it is intolerable to think of spending one’s whole life like a neuter bee, working, working, & nothing after all.” Children and companionship—the very first points he mentioned—were precisely those that ultimately swayed him in ...more

37%

Darwin was no Franklin, adding assorted considerations for days. Despite the seriousness with which he approached this life-changing choice, Darwin made up his mind exactly when his notes reached the bottom of the diary sheet. He was regularizing to the page. This is reminiscent of both Early Stopping and the Lasso: anything that doesn’t make the page doesn’t make the decision.

38%

In fact, no one understands as well as a computer scientist that in the face of a seemingly unmanageable challenge, you should neither toil forever nor give up, but—as we’ll see—try a third thing entirely.

38%

If it had been studied in the nineteenth century it might have become forever known as “the prairie lawyer problem,” and if it had first come up in the twenty-first century it might have been nicknamed “the delivery drone problem.” But like the secretary problem, it emerged in the mid-twentieth century, a period unmistakably evoked by its canonical name: “the traveling salesman problem.”

38%

Hassler Whitney posed the problem in a 1934 talk at Princeton, where it lodged firmly in the brain of fellow mathematician Merrill Flood (who, you might recall from chapter 1, is also credited with circulating the first solution to the secretary problem).

38%

They asserted what’s now known as the Cobham–Edmonds thesis: an algorithm should be considered “efficient” if it runs in what’s called “polynomial time”—that is, O(n2), O(n3), or in fact n to the power of any number at all. A problem, in turn, is considered “tractable” if we know how to solve it using an efficient algorithm.

38%

One of the simplest forms of relaxation in computer science is known as Constraint Relaxation. In this technique, researchers remove some of the problem’s constraints and set about solving the problem they wish they had. Then, after they’ve made a certain amount of headway, they try to add the constraints back in.

38%

For instance, you can relax the traveling salesman problem by letting the salesman visit the same town more than once, and letting him retrace his steps for free. Finding the shortest route under these looser rules produces what’s called the “minimum spanning tree.” (If you prefer, you can also think of the minimum spanning tree as the fewest miles of road needed to connect every town to at least one other town.

38%

For one thing, the spanning tree, with its free backtracking, will never be any longer than the real solution, which has to follow all the rules. Therefore, we can use the relaxed problem—the fantasy—as a lower bound on the reality.

39%

The message is simple but profound: if we’re willing to accept solutions that are close enough, then even some of the hairiest problems around can be tamed with the right techniques.

« Prev 1 … 6 7 8 … 11 Next »

See a Problem?

Preview — Algorithms to Live By by Brian Christian