More on this book
Community
Kindle Notes & Highlights
Google search data and other wellsprings of truth on the internet give us an unprecedented look into the darkest corners of the human psyche. This is at times, I admit, difficult to face. But it can also be empowering. We can use the data to fight the darkness. Collecting rich data on the world’s problems is the first step toward fixing them.
we will find that many of our adult behaviors and interests, even those that we consider fundamental to who we are, can be explained by the arbitrary facts of when we were born and what was going on in certain key years while we were young.
Between the key ages of fourteen and twenty-four, numerous Americans will form their views based on the popularity of the current president. A popular Republican or unpopular Democrat will influence many young adults to become Republicans. An unpopular Republican or popular Democrat puts this impressionable group in the Democratic column. And these views, in these key years, will, on average, last a lifetime.
You need a lot of pixels in a photo in order to be able to zoom in with clarity on one small portion of it. Similarly, you need a lot of observations in a dataset in order to be able to zoom in with clarity on one small subset of that data—for
Rich people everywhere tend to develop healthier habits—on average, they exercise more, eat better, smoke less, and are less likely to suffer from obesity. Rich people can afford the treadmill, the organic avocados, the yoga classes. And they can buy these things in any corner of the United States. For the poor, the story is different. For the poorest Americans, life expectancy varies tremendously depending on where they live. In fact, living in the right place can add five years to a poor person’s life expectancy.
The variable that does matter, according to Chetty and the others who worked on this study? How many rich people live in a city. More rich people in a city means the poor there live longer. Poor people in New York City, for example, live a lot longer than poor people in Detroit.
Contagious behavior may be driving some of this. There is a large amount of research showing that habits are contagious. So poor people living near rich people may pick up a lot of their habits.
Now stop for a moment and think about how revealing this study is. It demonstrated that, when it comes to figuring out who will cheat on their taxes, the key isn’t determining who is honest and who is dishonest. It is determining who knows how to cheat and who doesn’t.
These are just correlations, but they do suggest that growing up near big ideas is better than growing up with a big backyard.
there was another variable that was a strong predictor of a person’s securing an entry in Wikipedia: the proportion of immigrants in your county of birth. The greater the percentage of foreign-born residents in an area, the higher the proportion of children born there who go on to notable success. (Take that, Donald Trump!)
Usually, economists and sociologists focus on how to avoid bad outcomes, such as poverty and crime. Yet the goal of a great society is not only to leave fewer people behind; it is to help as many people as possible to really stand out. Perhaps this effort to zoom in on the places where hundreds of thousands of the most famous Americans were born can give us some initial strategies: encouraging immigration, subsidizing universities, and supporting the arts, among them.
You read that right. On weekends with a popular violent movie, when millions of Americans were exposed to images of men killing other men, crime dropped—significantly.
The data shows that the hours between 2 and 4 A.M. are prime time for big questions: What is the meaning of consciousness? Does free will exist? Is there life on other planets? The popularity of these questions late at night may be a result, in part, of cannabis use. Search rates for “how to roll a joint” peak between 1 and 2 A.M
Offer young, aggressive men the chance to see Hannibal, and they will go to the movies. Offer young, aggressive men Runaway Bride as their option, and they will take a pass and instead go out, perhaps to a bar, club, or a pool hall, where the incidence of violent crime is higher. Violent movies keep potentially violent people off the streets.
For one thing, doppelganger searches have been used by many of the biggest internet companies to dramatically improve their offerings and user experience. Amazon uses something like a doppelganger search to suggest what books you might like. They see what people similar to you select and base their recommendations on that.
Kohane talks repeatedly of “low-hanging fruit.” He believes, for instance, that merely creating a complete dataset of children’s height and weight charts and any diseases they might have would be revolutionary for pediatrics. Each child’s growth path then could be compared to every other child’s growth path. A computer could find children who were on a similar trajectory and automatically flag any troubling patterns.
In the digital world, randomized experiments can be cheap and fast. You don’t need to recruit and pay participants. Instead, you can write a line of code to randomly assign them to a group. You don’t need users to fill out surveys. Instead, you can measure mouse movements and clicks. You don’t need to hand-code and analyze the responses. You can build a program to automatically do that for you. You don’t have to contact anybody. You don’t even have to tell users they are part of an experiment.
Of course, the ease of such testing can lead to overuse. Some employees felt that because testing was so effortless, Google was overexperimenting. In 2009, one frustrated designer quit after Google went through forty-one marginally different shades of blue in A/B testing.
Testing fills in gaps in our understanding of human nature. These gaps will always exist. If we knew, based on our life experience, what the answer would be, testing would not be of value. But we don’t, so it is.
“There are a thousand people on the other side of the screen whose job it is to break down the self-regulation you have.” And these people are using A/B testing.
Nature experiments on us all the time. Two people get shot. One bullet stops just short of a vital organ. The other doesn’t. These bad breaks are what make life unfair. But, if it is any consolation, the bad breaks do make life a little easier for economists to study. Economists use the arbitrariness of life to test for causal effects.
Give stronger financial incentives to doctors to order certain procedures, this natural experiment suggests, and some will order more procedures that don’t make much difference for patients’ health and don’t seem to prolong their lives.
In fact, this category of natural experiments—utilizing sharp numerical cutoffs—is so powerful that it has its own name among economists: regression discontinuity. Anytime there is a precise number that divides people into two different groups—a discontinuity—economists can compare—or regress—the outcomes of people very, very close to the cutoff.
Their stunning results were made clear by the title they gave the paper: “Elite Illusion.” The effects of Stuyvesant High School? Nil. Nada. Zero. Bupkus. Students on either side of the cutoff ended up with indistinguishable AP scores and indistinguishable SAT scores and attended indistinguishably prestigious universities.
People adapt to their experience, and people who are going to be successful find advantages in any situation. The factors that make you successful are your talent and your drive. They are not who gives your commencement speech or other advantages that the biggest name-brand schools offer.
there is growing evidence that, while going to a good school is important, there is little gained from going to the greatest possible school.
people lie—to friends, to surveys, and to themselves—to make themselves look better. But the world also lies to us by presenting us with faulty, misleading data. The world shows us a huge number of successful Harvard graduates but fewer successful Penn State graduates, and we assume that there is a huge advantage to going to Harvard. By cleverly making sense of nature’s experiments, we can correctly make sense of the world’s data—to find what’s really useful and what is not.
These experiments demonstrate the potential of Big Data to replace guesses, conventional wisdom, and shoddy correlations with what actually works—causally.
The curse of dimensionality. The human genome, scientists now know, differs in millions of ways. There are, quite simply, too many genes to test. If you test enough tweets to see if they correlate with the stock market, you will find one that correlates just by chance. If you test enough genetic variants to see if they correlate with IQ, you will find one that correlates just by chance.
The solution is not always more Big Data. A special sauce is often necessary to help Big Data work best: the judgment of humans and small surveys, what we might call small data.
Is this acceptable? Do we want to live in a world in which companies use the words we write to predict whether we will pay back a loan? It is, at a minimum, creepy—and, quite possibly, scary.
This is the ethical question: Do corporations have the right to judge our fitness for their services based on abstract but statistically predictive criteria not directly related to those services?
For example, people who like Mozart, thunderstorms, and curly fries on Facebook tend to have higher IQs. People who like Harley-Davidson motorcycles, the country music group Lady Antebellum, or the page “I Love Being a Mom” tend to have lower IQs. Some of these correlations may be due to the curse of dimensionality. If you test enough things, some will randomly correlate.
In fairness, this is not an entirely new problem. People have long been judged by factors not directly related to job performance—the firmness of their handshakes, the neatness of their dress. But a danger of the data revolution is that, as more of our life is quantified, these proxy judgments can get more esoteric yet more intrusive. Better prediction can lead to subtler and more nefarious discrimination.
A doppelganger search is entertaining if it helps us predict whether a baseball player will return to his former greatness. A doppelganger search is great if it helps us cure someone’s disease. But if a doppelganger search helps a corporation extract every last penny from you? That’s not so cool. My spendthrift brother would have a right to complain if he got charged more online than tightwad me.
Scott Gnau, general manager of Terabyte, explains, in the excellent book Super Crunchers, what casino managers do when they see a regular customer nearing their pain point: “They come out and say, ‘I see you’re having a rough day. I know you like our steakhouse. Here, I’d like you to take your wife to dinner on us right now.’ ” This might seem the height of generosity: a free steak dinner. But really it’s self-serving.
On the other hand, Big Data has also been enabling consumers to score some blows against businesses that overcharge them or deliver shoddy products. One important weapon is sites, such as Yelp, that publish reviews of restaurants and other services.
Big Data to date has helped both sides in the struggle between consumers and corporations. We have to make sure it remains a fair fight.
There is a large ethical leap from the government having the search data of thousands or hundreds of thousands of people to the police department having the search data of an individual. There is a large ethical leap from protecting a local mosque to ransacking someone’s house. There is a large ethical leap from advertising suicide prevention to locking someone up in a mental hospital against his will.
In contrast, many people think that economists, sociologists, and psychologists are soft scientists who throw around meaningless jargon so they can get tenure. To the extent this was ever true, the Big Data revolution has changed that. If Karl Popper were alive today and attended a presentation by Raj Chetty, Jesse Shapiro, Esther Duflo, or (humor me) myself, I strongly suspect he would not have the same reaction he had back then. To be honest, he might be more likely to question whether today’s great string theorists are truly scientific or just engaging in self-indulgent mental gymnastics.