The Data Detective: Ten Easy Rules to Make Sense of Statistics
Rate it:
30%
Flag icon
Big found datasets can seem comprehensive, and may be enormously useful, but “N = All” is often a seductive illusion: it’s easy to make unwarranted assumptions that we have everything that matters.
31%
Flag icon
An algorithm, meanwhile, is a step-by-step recipe[*] for performing a series of actions, and in most cases “algorithm” means simply “computer program.” But over the past few years, the word has come to be associated with something quite specific: algorithms have become tools for finding patterns in large sets of data.
31%
Flag icon
“Found” datasets can be huge. They are also often relatively cheap to collect, updated in real time, and messy—a collage of data points collected for disparate purposes.
31%
Flag icon
As our communication, leisure, and commerce are moving to the internet, and the internet is moving into our phones, our cars, and even our spectacles, life can be recorded and quantified in a way that would have been hard to imagine just a decade ago.
31%
Flag icon
cheerleaders for big data have made three exciting claims, each one reflected in the success of Google Flu Trends. First, that data analysis produces uncannily accurate results. Second, that every single data point can be captured—the “N = All” claim we met in the previous chapter—making old statistical sampling techniques obsolete (what that means here is that Flu Trends captured every single search). And finally, that scientific models are obsolete, too: there’s simply no need to develop and test theories about why searches for “flu symptoms” or “Beyoncé” might or might not be correlated ...more
31%
Flag icon
a theory-free analysis of mere correlations is inevitably fragile. If you have no idea what is behind a correlation, you have no idea what might cause that correlation to break down.
32%
Flag icon
Making big data work is harder than it seems. Statisticians have spent the past two hundred years figuring out what traps lie in wait when we try to understand the world through data. The data are bigger, faster, and cheaper these days, but we must not pretend that the traps have all been made safe. They have not.
33%
Flag icon
I hope I’ve persuaded you that we shouldn’t be too eager to entrust our decisions to algorithms. But I don’t want to overdo the critique, because we don’t have some infallible alternative way of making decisions. The choice is between algorithms and humans. Some humans are prejudiced. Many humans are frequently tired, harassed, and overworked. And all humans are, well, human.
33%
Flag icon
we should compare the fallibility of today’s algorithms with that of the humans who would otherwise be making the decisions.
33%
Flag icon
Hannah Fry’s book Hello World.
34%
Flag icon
Many people have strong intuitions about whether they would rather have a vital decision about them made by algorithms or humans. Some people are touchingly impressed by the capabilities of the algorithms; others have far too much faith in human judgment. The truth is that sometimes the algorithms will do better than the humans, and sometimes they won’t. If we want to avoid the problems and unlock the promise of big data, we’re going to need to assess the performance of the algorithms on a case-by-case basis.
34%
Flag icon
the problem is not the algorithms, or the big datasets. The problem is a lack of scrutiny, transparency, and debate.
35%
Flag icon
Alchemy is not the same as gathering big datasets and developing pattern-recognizing algorithms. For one thing, alchemy is impossible, and deriving insights from big data is not. Yet the parallels should also be obvious. The likes of Google and Target are no more keen to share their datasets and algorithms than Newton was to share his alchemical experiments.
35%
Flag icon
There’s gold in the data that Amazon, Apple, Facebook, Google, and Microsoft have about us. And that gold will be worth a lot less to them if the knowledge that produces it is shared with everyone.
35%
Flag icon
just as the most brilliant thinkers of the age failed to make progress while practicing in secret, secret algorithms based on secret data are likely to lead to missed opportunities for improvement.
36%
Flag icon
Onora O’Neill argues that if we want to demonstrate trustworthiness, we need the basis of our decisions to be “intelligently open.” She proposes a checklist of four properties that intelligently open decisions should have. Information should be accessible: that implies it’s not hiding deep in some secret data vault. Decisions should be understandable—capable of being explained clearly and in plain language. Information should be usable—which may mean something as simple as making data available in a standard digital format. And decisions should be assessable—meaning that anyone with the time ...more
36%
Flag icon
anyone who is confident of the effectiveness of their algorithm should be happy to demonstrate that effectiveness in a fair and rigorous test.
36%
Flag icon
We need to look on a case-by-case basis. What sort of accountability or transparency we want depends on what problem we are trying to solve.
36%
Flag icon
Modern data analytics can produce some miraculous results, but big data is often less trustworthy than small data. Small data can typically be scrutinized; big data tends to be locked away in the vaults of Silicon Valley. The simple statistical tools used to analyze small datasets are usually easy to check; pattern-recognizing algorithms can all too easily be mysterious and commercially sensitive black boxes.
36%
Flag icon
We should not simply trust that algorithms are doing a better job than humans, nor should we assume that if the algorithms are flawed, the humans would be flawless.
40%
Flag icon
Now, it’s one thing to be wrong, or to have a view of the world that misses out on something important. But, argues Scott, because the state is powerful, its misperceptions of the world often take physical form, producing well-meaning but clumsy and oppressive modernist schemes that ignore local knowledge and stifle local autonomy.
40%
Flag icon
States should be humble. Bureaucrats must recognize the limits of their knowledge. There is always a risk that the bird’s-eye view is so grand and sweeping as to induce delusions of omnipotence.
40%
Flag icon
the tactic of simply refusing to collect basic statistics could only make sense for a libertarian, laissez-faire regime. And the truth is that very few people seem attracted by that prospect. For better or worse, we want our governments to take action, and if they are to take action they need information. Statistics collected by the state make for better-informed policies—on crime, education, infrastructure, and much else.
41%
Flag icon
There is nothing wrong with the idea that government should collect statistics to inform itself. But there is a risk that this view slips into a proprietorial sense of ownership, when politicians believe not only that they should be using statistics to run the country, but that those statistics are none of anyone else’s business, and that external scrutiny is a distraction. The facts are no longer the facts—they become the tools of the powerful.
41%
Flag icon
Good statistics don’t just serve government planners; they are valuable to a far wider group of people.
41%
Flag icon
This isn’t just about making money; it’s about making sure that citizens have access to accurate information about the world in which they live.
43%
Flag icon
Much of the data visualization that bombards us today is decoration at best, and distraction or even disinformation at worst. The decorative function is surprisingly common, perhaps because the data visualization teams of many media organizations are part of the art departments. They are led by people whose skills and experience are not in statistics but in illustration or graphic design.[4] The emphasis is on the visualization, not on the data. It is, above all, a picture.
43%
Flag icon
Data visualization ducks can be more than tasteless: the duckness of the graph can actually obscure—or worse, it can misrepresent—the underlying information.
44%
Flag icon
The most straightforward problem with a clever decorative idea is that the basic data may not be solid. The visualization then simply hides that fact—the shimmering icing over a moldering statistical cake.
44%
Flag icon
So information is beautiful—but misinformation can be beautiful, too. And producing beautiful misinformation is becoming easier than ever.
45%
Flag icon
A good chart isn’t an illustration but a visual argument,” declares Alberto Cairo near the beginning of his book How Charts Lie.
45%
Flag icon
by organizing and presenting the data, we are inviting people to draw certain conclusions. And just as a verbal argument can be logical or emotional, sharp or woolly, clear or baffling, honest or misleading, so too can the argument made by a chart.
46%
Flag icon
When you look at data visualizations, you’ll do much better if you recognize that someone may well be trying to persuade you of something. There is nothing wrong with artfully persuasive graphs, any more than with artfully persuasive words. And there is nothing wrong with being persuaded and changing your mind.
48%
Flag icon
our preconceptions are powerful things. We filter new information. If it accords with what we expect, we’ll be more likely to accept it.
48%
Flag icon
Our brains are always trying to make sense of the world around us based on incomplete information. The brain makes predictions about what it expects, and tends to fill in the gaps, often based on surprisingly sparse data.
48%
Flag icon
Our brains fill in the gaps—which is why we see what we expect to see and hear what we expect to hear,
48%
Flag icon
we can also filter new information consciously, because we don’t want it to spoil our day.
48%
Flag icon
One of the reasons facts don’t always change our minds is that we are keen to avoid uncomfortable truths.
49%
Flag icon
we make a forecast with the facts that are in front of our nose.
49%
Flag icon
But it is a better idea to zoom out and find one very straightforward[*] statistic: In general, how many marriages end in divorce? This number is known as the “base rate.”
49%
Flag icon
The importance of the base rate was made famous by the psychologist Daniel Kahneman, who coined the phrase “the outside view and the inside view.” The inside view means looking at the specific case in front of you:
49%
Flag icon
The outside view requires you to look at a more general “comparison class” of cases—here,
49%
Flag icon
Ideally, a decision maker or a forecaster will combine the outside view and the inside view—or, similarly, statistics plus personal experience. But it’s much better to start with the statistical view, the outside view, and then modify it in the light of personal experience than it is to go the other way around. If you start with the inside view you have no real frame of reference, no sense of scale—and can easily come up with a probability that is ten times too large, or ten times too small.
49%
Flag icon
Second, keeping score was important.
49%
Flag icon
Third, superforecasters tended to update their forecasts frequently as new information emerged, which suggests that a receptiveness to new evidence was important. This willingness to adjust predictions is correlated with making better predictions in the first place:
50%
Flag icon
superforecasting is a matter of having an open-minded personality.
50%
Flag icon
The superforecasters are what psychologists call “actively open-minded thinkers”—people who don’t cling too tightly to a single approach, are comfortable abandoning an old view in the light of fresh evidence or new arguments, and embrace disagreements with others as an opportunity to learn.
50%
Flag icon
“For superforecasters, beliefs are hypotheses to be tested, not treasures to be guard...
This highlight has been truncated due to consecutive passage length restrictions.
50%
Flag icon
superforecasting means being willing to change your mind.
51%
Flag icon
“Making public commitments ‘freezes’ attitudes in place. So saying something dumb makes you a bit dumber. It becomes harder to correct yourself.”