How Not to Be Wrong: The Power of Mathematical Thinking
Rate it:
Open Preview
Read between December 3, 2020 - March 14, 2023
21%
Flag icon
Where Freud had claimed to see what had previously been hidden, repressed, or obscured, Skinner wanted to do the opposite—to deny the existence of what seemed in plain view.
21%
Flag icon
A significance test is an instrument, like a telescope. And some instruments are more powerful than others. If you look at Mars with a research-grade telescope, you’ll see moons; if you look with binoculars, you won’t. But the moons are still there!
21%
Flag icon
Stuff your sonnets with one or two extra alliterations each and you become one of the stone-footed poets mocked by Shakespeare’s fellow Elizabethan George Gascoigne: “Many writers indulge in repeticion of sundrie wordes all beginning with one letter, the whiche (beyng modestly used) lendeth good grace to a verse; but they do so hunt a letter to death, that they make it Crambe, and Crambe bis positum mors est.” The Latin phrase means “Cabbage served twice is death.”
21%
Flag icon
A statistical study that’s not refined enough to detect a phenomenon of the expected size is called underpowered—the equivalent of looking at the planets with binoculars. Moons or no moons, you get the same result, so you might as well not have bothered. You don’t send binoculars to do a telescope’s job. The problem of low power is the flip side to the problem of the British birth control scare. A high-powered study, like the birth control trial, may lead you to burst a vein about a small effect that isn’t actually important. An underpowered one may lead you to wrongly dismiss a small effect ...more
22%
Flag icon
A player who made a layup was no more likely to shoot from distance than a player who just missed a layup. Layups are easy and shouldn’t give the player a strong sense of being hot. But a player is much more likely to try a long shot after a three-point basket than after a three-point miss. In other words, the hot hand might “cancel itself out”—players, believing themselves to be hot, get overconfident and take shots they shouldn’t. The nature of the analogous phenomenon in stock investment is left as an exercise for the reader.
22%
Flag icon
Assuming the truth of something we quietly believe to be false is a time-honored method of argument that goes all the way back to Aristotle; it is the proof by contradiction, or reductio ad absurdum. The reductio is a kind of mathematical judo, in which we first affirm what we wish eventually to deny, with the plan of throwing it over our shoulder and defeating it by means of its own force. If a hypothesis implies a falsehood,* then the hypothesis itself must be false. So the plan goes like this: Suppose the hypothesis H is true. It follows from H that a certain fact F cannot be the case. But ...more
25%
Flag icon
In other words, in every matter concerning divination from entrails, I’m a proponent of the null hypothesis.
25%
Flag icon
A recent paper in Psychological Science, a premier psychological journal, found that married women were significantly more likely to support Mitt Romney, the Republican presidential candidate, when they were in the fertile portion of their ovulatory cycle: of those women queried during their peak fertility period, 40.4% expressed support for Romney, while only 23.4% of the married women polled at infertile times were pulling the lever for Mitt.* The sample is small, just 228 women, but the difference is big, big enough that the result passes the p-value test with a score of .03. Which is just ...more
25%
Flag icon
In other words, we can be quite confident that the large effect reported in the study is mostly or entirely just noise in the signal. But noise is just as likely to push you in the opposite direction from the real effect as it is to tell the truth. So we’re left in the dark by a result that offers plenty of statistical significance but very little confidence.
25%
Flag icon
Consider the profound xkcd cartoon below. Suppose you tested twenty genetic markers to see whether they were associated with some disorder of interest, and you found just one result that achieved p < .05 significance. Being a mathematical sophisticate, you’d recognize that one success in twenty is exactly what you’d expect if none of the markers had any effect, and you’d scoff at the misguided headline, just as the cartoonist intends you to.
26%
Flag icon
If you decide what color jelly beans to eat based just on the papers that get published, you’re making the same mistake the army made when they counted the bullet holes on the planes that came back from Germany. As Abraham Wald pointed out, if you want an honest view of what’s going on, you also have to consider the planes that didn’t come back.
26%
Flag icon
There’s one big difference, though. In science, there’s no shady con man and no innocent victim. When the scientific community file-drawers its failed experiments, it plays both parts at once. They’re running the con on themselves.
26%
Flag icon
Uri Simonsohn, a professor at Penn who’s a leader in the study of replicability, calls these practices “p-hacking.” Hacking the p isn’t usually as crude as I’ve made it out to be, and it’s seldom malicious. The p-hackers truly believe in their hypotheses, just as the Bible coders do, and when you’re a believer, it’s easy to come up with reasons that the analysis that gives a publishable p-value is the one you should have done in the first place.
26%
Flag icon
When they don’t think anyone’s listening, scientists call this practice “torturing the data until it confesses.” And the reliability of the results are about what you’d expect from confessions extracted by force.
26%
Flag icon
a lot of experimental results that belong over on the unpublishable side of the p= .05 boundary have been cajoled, prodded, tweaked, or just plain tortured until, at last, they end up just on the happy side of the line. That’s good for the scientists who need publications, but it’s bad for science.
26%
Flag icon
To live or die by the .05 is to make a basic category error, treating a continuous variable (how much evidence do we have that the drug works, the gene predicts IQ, fertile women like Republicans?) as if it were a binary one (true or false? yes or no?). Scientists should be allowed to report statistically insignificant data.
27%
Flag icon
The confidence interval is also informative in cases where you don’t get a statistically significant result—that is, where the confidence interval contains zero. If the confidence interval is [−0.5%, 0.5%], then the reason you didn’t get statistical significance is because you have good evidence the intervention doesn’t do anything. If the confidence interval is [−20%, 20%], the reason you didn’t get statistical significance is because you have no idea whether the intervention has an effect, or in which direction it goes. Those two outcomes look the same from the viewpoint of statistical ...more
27%
Flag icon
When a drug fails a significance test, we don’t say, “We are quite certain the drug didn’t work,” but merely, “The drug wasn’t shown to work.”
27%
Flag icon
The significance test is the detective, not the judge.
27%
Flag icon
The age of Big Data is frightening to a lot of people, and it’s frightening in part because of the implicit promise that algorithms, sufficiently supplied with data, are better at inference than we are. Superhuman powers are scary: beings that can change their shape are scary, beings that rise from the dead are scary, and beings that can make inferences that we cannot are scary.
27%
Flag icon
But it’s possible we ought to spend less time worrying about eerily superpowered algorithms and more time worrying about crappy ones.
28%
Flag icon
In 1950, it took the early computer ENIAC twenty-four hours to simulate twenty-four hours of weather, and that was an astounding feat of space-age computation. In 2008, the computation was reproduced on a Nokia 6300 mobile phone in less than a second.
28%
Flag icon
Weather is, in the technical sense of the word, chaotic. In fact, it was in the numerical study of weather that Edward Lorenz discovered the mathematical notion of chaos in the first place. He wrote, “One meteorologist remarked that if the theory were correct, one flap of a sea gull’s wing would be enough to alter the course of the weather forever. The controversy has not yet been settled, but the most recent evidence seems to favor the sea gulls.”
28%
Flag icon
gravidity
29%
Flag icon
There are really two questions you can ask. They sound kind of the same, but they’re not. Question 1: What’s the chance that a person gets put on Facebook’s list, given that they’re not a terrorist? Question 2: What’s the chance that a person’s not a terrorist, given that they’re on Facebook’s list? One way you can tell these two questions are different is that they have different answers. Really different answers. We’ve already seen that the answer to the first question is about 1 in 2,000, while the answer to the second is 99.99%. And it’s the answer to the second question that you really ...more
29%
Flag icon
The quantities these questions contemplate are called conditional probabilities; “the probability that X is the case, given that Y is.” And what we’re wrestling with here is that the probability of X, given Y, is not the same as the probability of Y, given X.
29%
Flag icon
When the district attorney leans into the jury box and announces, “There is only a one in five million, I repeat, a ONE IN FIVE MILLLLLLLION CHANCE that an INNOCENT MAN would match the DNA sample found at the scene,” he is answering question 1: How likely would an innocent person be to look guilty? But the jury’s job is to answer question 2: How likely is this guilty-looking defendant to be innocent? That’s a question the DA can’t help them with.*
29%
Flag icon
The example of Facebook and the terrorists makes it clear why you should worry about bad algorithms as much as good ones. Maybe more. It’s creepy and bad when you’re pregnant and Target knows you’re pregnant. But it’s even creepier and worse if you’re not a terrorist and Facebook thinks you are.
29%
Flag icon
In 2009, Iran held a presidential election, which incumbent Mahmoud Ahmadinejad won by a large margin. There were widespread accusations that the vote had been fixed. But how could you hope to test the legitimacy of the vote count in a country whose government allowed for almost no independent oversight? Two graduate students at Columbia, Bernd Beber and Alexandra Scacco, had the clever idea to use the numbers themselves as evidence of fraud, effectively compelling the official vote count to testify against itself. They looked at the official total amassed by the four main candidates in each ...more
31%
Flag icon
Our priors are not flat, but spiky. We assign a lot of mental weight to a few theories, while others, like the RBRRB theory, get assigned a probability almost indistinguishable from zero. How do we choose our favored theories? We tend to like simpler theories better than more complicated ones, theories that rest on analogies to things we already know about better than theories that posit totally novel phenomena.
31%
Flag icon
If you’ve ever used America’s most popular sort-of-illegal psychotropic substance, you know what it feels like to have too-flat priors. Every single stimulus that greets you, no matter how ordinary, seems intensely meaningful. Each experience grabs hold of your attention and demands that you take notice. It’s a very interesting mental state to be in. But it’s not conducive to making good inferences.
31%
Flag icon
If you do happen to find yourself partially believing a crazy theory, don’t worry—probably the evidence you encounter will be inconsistent with it, driving down your degree of belief in the craziness until your beliefs come into line with everyone else’s. Unless, that is, the crazy theory is designed to survive this winnowing process. That’s how conspiracy theories work.
31%
Flag icon
Suppose you learn from a trusted friend that the Boston Marathon bombing was an inside job carried out by the federal government in order to, I don’t know, garner support for NSA wiretapping. Call that theory T. At first, because you trust your friend, maybe you assign that theory a reasonably high probability, say 0.1. But then you encounter other information: police located the suspected perpetrators, the surviving suspect confessed, etc. Each of these pieces of information is pretty unlikely, given T, and each one knocks down your degree of belief in T until you hardly credit it at all. ...more
This highlight has been truncated due to consecutive passage length restrictions.
32%
Flag icon
What Sherlock Holmes should have said was: “It is an old maxim of mine that when you have excluded the impossible, whatever remains, however improbable, must be the truth, unless the truth is a hypothesis it didn’t occur to you to consider.”
32%
Flag icon
When the meteorologist on the nightly news says, “There’s a 20% chance of rain tomorrow,” what he means is that, among some large population of past days with conditions similar to those currently obtaining, 20% of them were followed by rainy days. But what can we mean when we say, “There’s a 20% chance that God created the universe?” It can’t be that one in five universes was made by God and the rest popped up on their own. The truth is, I’ve never seen a method I find satisfying for assigning numbers to our uncertainty about ultimate questions of this kind. As much as I love numbers, I think ...more
33%
Flag icon
Expected value is another one of those mathematical notions saddled, like significance, with a name that doesn’t quite capture its meaning.
Kate O'Neill
.c1
33%
Flag icon
A better name might be “average value”—for what the expected value of the bet really measures is what I’d expect to happen if I made many such bets on many such dogs.
Kate O'Neill
.c2
36%
Flag icon
ADDITIVITY: The expected value of the sum of two things is the sum of the expected value of the first thing with the expected value of the second thing.
37%
Flag icon
Other times, your simplification is so simple that it eliminates the interesting features of the problem, as in the old joke about the physicist tasked with optimizing dairy production: he begins, with great confidence, “Consider a spherical cow . . .”
38%
Flag icon
Mathematics as currently practiced is a delicate interplay between monastic contemplation and blowing stuff up with dynamite.
38%
Flag icon
Math, like meditation, puts you in direct contact with the universe, which is bigger than you, was here before you, and will be here after you.
39%
Flag icon
The standard economic story is that human beings, when they’re acting rationally, make decisions that maximize their utility. Everything in life has utility; good things, like dollars and cake, have positive utility, while bad things, like stubbed toes and missed planes, have negative utility. Some people even like to measure utility in standard units, called utils.*
42%
Flag icon
Georges-Louis LeClerc, Comte de Buffon. Buffon’s interest in probability was not restricted to parlor games. Late in life, he reminisced about his encounter with the vexing St. Petersburg paradox: “I dreamed about this problem some time without finding the knot; I could not see that it was possible to make mathematical calculations agree with common sense without introducing some moral considerations; and having expressed my ideas to Mr. Cramer, he told me that I was right, and that he had also resolved this question by a similar approach.”
42%
Flag icon
Pierre-Simon Laplace, on the last page of his 1814 treatise A Philosophical Essay on Probabilities, writes, “We see, in this Essay, that the theory of probabilities is, in the end, only common sense boiled down to ‘calculus’; it points out in a precise way what rational minds understand by means of a sort of instinct, without necessarily being aware of it. It leaves nothing to doubt, in the choice of opinions and decisions; by its use one can always determine the most advantageous choice.” Again we see it: mathematics is the extension of common sense by other means.
42%
Flag icon
This time, the puzzle-bearer was Daniel Ellsberg, who later became famous as the whistle-blower who leaked the Pentagon Papers to the civilian press. (In mathematical circles, which can be parochial at times, it would not be outlandish to hear it said of Ellsberg, “You know, before he got involved in politics, he did some really important work.”)
42%
Flag icon
Von Neumann and Morgenstern,* in their foundational book The Theory of Games and Economic Behavior,
42%
Flag icon
The “known unknowns” are like RED—we don’t know which ball we’ll get, but we can quantify the probability that the ball will be the color we want. BLACK, on the other hand, subjects the player to an “unknown unknown”—not only are we not sure whether the ball will be black, we don’t have any knowledge of how likely it is to be black. In the decision-theory literature, the former kind of unknown is called risk, the latter uncertainty.
43%
Flag icon
This is a mathematical way of formalizing a principle you already know: the richer you are, the more risks you can afford to take. Bets like the one above are like risky stock investments with a positive expected dollar payoff; if you make a lot of these investments, you might sometimes lose a bunch of cash at once, but in the long run you’ll come out ahead. The rich person, who has enough reserves to absorb those occasional losses, invests and gets richer; the nonrich people stay right where they are.
43%
Flag icon
The pros call moves like this “picking up pennies in front of a steamroller”—most of the time you make a little money, but one small slip and you’re squashed.
43%
Flag icon
As the old saying goes, if you’re down a million bucks, it’s your problem; but if you’re down five billion bucks, it’s the government’s problem.