More on this book
Community
Kindle Notes & Highlights
by
Tim Harford
Read between
August 19 - September 11, 2022
cheerleaders for big data have made three exciting claims, each one reflected in the success of Google Flu Trends. First, that data analysis produces uncannily accurate results. Second, that every single data point can be captured—the “N = All” claim we met in the previous chapter—making old statistical sampling techniques obsolete (what that means here is that Flu Trends captured every single search). And finally, that scientific models are obsolete, too: there’s simply no need to develop and test theories about why searches for “flu symptoms” or “Beyoncé” might or might not be correlated
...more
Part of the problem was rooted in that third exciting claim: Google did not know—and it could not begin to know—what linked the search terms with the spread of flu.
Flu Trends was part flu detector, part winter detector.
The “winter detector” problem is common in big-data analysis.
If we don’t know why the algorithm is doing what it’s doing, we’re trusting our lives to a ruler detector.
We should not simply accept the idea that Target’s computer is a mind reader before considering how many misses attend each hit.
excessive credulity in the power of the algorithm to extract wisdom from the data it is fed. There’s another, related problem: excessive credulity in the quality or completeness of the dataset.
You can take a million temperature readings, but if your thermometer is broken and you’re poking around in armpits, then your results will be a precise estimate of the wrong answer. The old cliché of “Garbage in, garbage out” remains true no matter how many scraps of garbage you collect.
the modern version of this old problem is an algorithm that has been trained on a systematically biased dataset.
alchemy was pursued in secret, while science depended on open debate.
There’s gold in the data that Amazon, Apple, Facebook, Google, and Microsoft have about us. And that gold will be worth a lot less to them if the knowledge that produces it is shared with everyone. But just as the most brilliant thinkers of the age failed to make progress while practicing in secret, secret algorithms based on secret data are likely to lead to missed opportunities for improvement.
a checklist of four properties that intelligently open decisions should have. Information should be accessible: that implies it’s not hiding deep in some secret data vault. Decisions should be understandable—capable of being explained clearly and in plain language. Information should be usable—which may mean something as simple as making data available in a standard digital format. And decisions should be assessable—meaning that anyone with the time and expertise has the detail required to rigorously test any claims or decisions if they wish to.
Consider the CBO: it advises Congress on $4 trillion worth of annual spending, on a budget of just $50 million a year. To put it another way, for every $80,000 the US government spends, one dollar funds the CBO to shed light on the other $79,999.
is it reasonable to worry that the more governments know about us, the more they will be tempted to exert control over us?
For better or worse, we want our governments to take action, and if they are to take action they need information. Statistics collected by the state make for better-informed policies—on crime, education, infrastructure, and much else.
scrutiny is vital. It’s what distinguishes science from alchemy. If statistics are published and designed to be accessible to all, they can be analyzed and examined by academics, policy wonks, and indeed anybody with a bit of time and access to a computer. Errors can be identified and corrected.
official statistics are still the closest we have to data bedrock. When a country picks and defends a team of skilled, professional, and independent statisticians, the facts have a way of making themselves known.
The word “see” is often used as a direct synonym for “understand”—“I see what you mean.” Yet sometimes we see but we don’t understand; worse, we see, then “understand” something that isn’t true at all.
Never let zippy design distract you from the possibility that the underlying numbers simply might be wrong.
Data visualization ducks can be more than tasteless: the duckness of the graph can actually obscure—or worse, it can misrepresent—the underlying information.[*]
information is beautiful—but misinformation can be beautiful, too. And producing beautiful misinformation is becoming easier than ever.
Ideas that are best expressed in words or numbers are turned into graphics anyway, because that’s what spreads on social media. Unfortunately, the selection mechanism is often some combination of beauty and shock value, rather than pertinence and accuracy.
we should notice our emotional response to the factual claims around us. Just so: pictures engage the imagination and the emotion, and are easily shared before we have time to think a little harder. If we don’t, we’re allowing ourselves to be dazzled.
Say It with Charts, the bible of management consultants, makes this process very clear. First, says author Gene Zelazny, decide what you want to say with a graph. Once you’ve decided what you want to say, that suggests a particular kind of comparison. That, in turn, suggests a particular choice of graph—such as a scatterplot, a line graph, a stacked bar chart, or a pie chart.[*] Finally, underline your message by sticking it in the graph title. Don’t just write “Number of contracts, January–August.” Write something like “The number of contracts has increased” or “The number of contracts has
...more
First—and most important, since the visual sense can be so visceral—check your emotional response.
Second, check that you understand the basics behind the graph. What do the axes actually mean? Do you understand what is being measured or counted? Do you have the context to understand, or is the graph showing just a few data points?
When you look at data visualizations, you’ll do much better if you recognize that someone may well be trying to persuade you of something. There is nothing wrong with artfully persuasive graphs, any more than with artfully persuasive words. And there is nothing wrong with being persuaded and changing your mind.
A man with a conviction is a hard man to change. Tell him you disagree and he turns away. Show him facts or figures and he questions your sources. Appeal to logic and he fails to see your point. • Leon Festinger, Henry Riecken, and Stanley Schachter, When Prophecy Fails[1]
whenever a number was close to Millikan’s, it was accepted without too much scrutiny. When a number seemed wrong it would be viewed with skepticism. Reasons would be found to discard it. As we saw in chapter one, our preconceptions are powerful things. We filter new information. If it accords with what we expect, we’ll be more likely to accept it.
it is possible to gather and to analyze numbers in ways that help us understand the world. But it has also argued that very often we make mistakes not because the data aren’t available, but because we refuse to accept what they are telling us.
First, we should learn to stop and notice our emotional reaction to a claim, rather than accepting or rejecting it because of how it makes us feel. Second, we should look for ways to combine the “bird’s eye” statistical perspective with the “worm’s eye” view from personal experience. Third, we should look at the labels on the data we’re being given, and ask if we understand what’s really being described. Fourth, we should look for comparisons and context, putting any claim into perspective. Fifth, we should look behind the statistics at where they came from—and what other data might have
...more
A curious person, however, enjoys being surprised and hungers for the unexpected. He or she will not be filtering out surprising news, because it’s far too intriguing.
As Loewenstein puts it, curiosity starts to glow when there’s a gap “between what we know and what we want to know.” There’s a sweet spot for curiosity: if we know nothing, we ask no questions; if we know everything, we ask no questions either. Curiosity is fueled once we know enough to know that we do not know.
Ignite the spark of curiosity and give it some fuel, using the time-honored methods of storytelling, character, suspense, and humor.
once we start to peer beneath the surface of things, become aware of the gaps in our knowledge, and treat each question as the path to a better question, we find that curiosity is habit-forming.
while “I don’t believe it” is sometimes the right starting point when confronted with a surprising statistical claim, it is a lazy and depressing place to finish.