Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
Rate it:
26%
Flag icon
A wonderful example is the Monty Hall problem: On the television show Let’s Make a Deal, you are offered a choice of what is behind one of three doors, one a grand prize and the other two goats. After you pick a door, the host, Monty Hall, does what he always does by showing you a goat behind a door you did not choose and asking if you want to switch doors.
26%
Flag icon
Does it matter if Monty Hall reminds you that there is a goat behind one of these doors, or if he proves it by showing you a goat? You haven’t learned anything useful about the door you did choose. There is still a one-third chance that it is the winning door, and therefore, the probability that the last door is the winner has risen to two-thirds. You should switch.
51%
Flag icon
Don’t be Fooled: Data clusters are everywhere, even in random data. Someone who looks for an explanation will inevitably find one, but a theory that fits a data cluster is not persuasive evidence. The found explanation needs to make sense and it needs to be tested with uncontaminated data.
70%
Flag icon
In 1996 the Gardner brothers wrote a wildly popular book with the beguiling name, The Motley Fool Investment Guide: How the Fools Beat Wall Street’s Wise Men and How You Can Too. Hey, if fools can beat the market, so can we all. The Gardners recommended what they called the Foolish Four Strategy. They claimed that during the years 1973–93, the Foolish Four Strategy had an annual average return of 25 percent
71%
Flag icon
But beyond this kernel of a borrowed idea, the Foolish Four Strategy is pure data mining.
71%
Flag icon
Shortly after the Gardners launched the Foolish Four Strategy, two skeptical finance professors tested it using data from the years 1949–72, just prior to the period data mined by the Gardners. It didn’t work. The professors also retested the Foolish Four Strategy during the years that were data mined by the Gardners, but with a clever twist. Instead of choosing the portfolio on the first trading day in January, they implemented the strategy on the first trading day of July. If the strategy has any merit, it shouldn’t be sensitive to the starting month. But, of course, it was.
83%
Flag icon
Another common problem with theorizing about what we observe is the survivor bias that can occur because we don’t see things that no longer exist. A study of the elderly does not include people who did not live long enough to become elderly. An examination of planes that survived bombing runs does not include planes that were shot down.
83%
Flag icon
Be doubly skeptical of graphs that have two vertical axes and omit zero from either or both axes.
83%
Flag icon
A very common logical error is to confuse two conditional statements. The probability that a person who has a disease will have a positive test result is not the same as the probability that a person with a positive test result has the disease.
83%
Flag icon
As the population grows over time, so do many human activities (including the number of people watching television, eating oranges, and dying) which are unrelated but nonetheless statistically correlated because they grow with the population. Watching television does not make us eat oranges and eating oranges does not kill us.
83%
Flag icon
When you hear a puzzling assertion (or even one that makes sense), think about whether confounding factors might by responsible. Sweden has a higher female mortality rate than Costa Rica—because there are more elderly women in Sweden. Berkeley’s graduate programs admitted fewer female applicants—because women applied to more selective programs.
84%
Flag icon
Expect those at the extremes to regress to the mean.
84%
Flag icon
Researchers seeking fame and funding often turn into Texas sharpshooters, firing at random and painting a bullseye around the area with the most hits. It is easy to find a theory that fits the data if the data are used to invent the theory.
84%
Flag icon
DATA WITHOUT THEORY ARE JUST DATA