More on this book
Community
Kindle Notes & Highlights
Other sources indicate that the real motivation behind the prohibition was because Guinness simply didn’t want to reveal one key competitive strategy—
the fact that it was hiring statisticians.
Since Gosset apparently valued his job more than immediate recognition, he published his t-statistic under the name “Student.” Although the true author has been long known, virtually all statistics texts still call this the “student’s t-statistic.”
When you have a lot of uncertainty, a few samples greatly reduce it, especially with relatively homogeneous populations.
In some cases, calibrated estimators were able to reduce uncertainty even with only one sample—which is impossible with the traditional statistics we just discussed.
The graph in Exhibit 9.2 looks something like a tornado on its side.
How would your average executive measure the population of fish in a lake?
If you told marine biologists to measure the fish in the lake, they would not confuse a “count” with a “measure.” Instead, the biologists might employ a method called “catch and recatch.” First, they would catch and tag a sample of fish—let’s say 1,000—and release them back into the lake. Then, after the tagged fish had a chance to disperse among the rest of the population, they would catch another sample of fish. Suppose they caught 1,000 fish again, and
this time 50 of those 1,000 fish were tagged. This means that about 5% of the fish in the lake are tagged. Since the marine biologists know they originally tagged 1,000 fish, they conclude that the lake contains about 20,000 fish (5% of 20,000 is 1,000).
Exhibit 9.7 shows how the Mark V tank production estimates from Allied Intelligence and from the
statistical method compared to the actual number confirmed by postwar analysis of captured documents. Clearly, the statistical method based on the analysis of serial numbers of captured tanks is the hands-down winner in this comparison.
This method may make sense when you have some groups within a population that vary widely from each other but are fairly homogeneous
inside a group.
Perhaps the biggest misconception some managers may run into is the belief that correlation proves causation.
If church donations and liquor sales are correlated, it is not because of some collusion between clergy and the liquor industry.
They will blurt out “correlation doesn’t mean causation”
Perhaps the Two Biggest Mistakes in Interpreting Correlation? Biggest mistake: That correlation proves causation. Second biggest mistake: That correlation isn’t evidence of causation.
“Significance or the lack of it provides no degree of belief—high, moderate, or low—about prediction of performance in the future, which is the only reason to carry out the comparison, test, or experiment in the first place.”
A number of other authors to this day debate that even when done properly, statistical significance is unnecessary4 and that its use has been a costly error in public policy.5,6
Dealing with this prior knowledge is what is called “Bayesian statistics.” Early in this book we mentioned that the inventor of this approach, Thomas Bayes, was an eighteenth-century British mathematician and Presbyterian minister whose most famous contribution to statistics would not be published until after he died. Bayesian statistics deals with the issue of how we update prior knowledge with new information. With Bayesian analysis, we start with how much we know now and then consider how that knowledge is changed by new information.
In both of these cases, I took the position of maximum uncertainty. This is also called the “robust” Bayesian approach, and it minimizes the prior-knowledge assumptions including an assumption of normality.
Bayes’ theorem is simply a relationship of probabilities and “conditional” probabilities. A conditional probability is the chance of something given a particular condition. See Exhibit 10.1 for a summary of these and some other basic concepts in probability which we will need to have handy for the rest of the discussion.
A person who has a lot of experience with the resistance to measurement in IT security is Peter Tippett, formerly of Cybertrust. He applied his MD and PhD in biochemistry in a way that none of his classmates probably imagined: He wrote the first antivirus software. His innovation later became Norton Antivirus. Since then, Tippett has
conducted major quantitative studies involving hundreds of organizations to measure the relative risks of different security threats. With these credentials, you might think that his claim that security can be measured would be accepted at face value. Yet many in the IT security industry seem to have a deeply rooted disposition against the very idea that security is measurable at all.
There is no sense of prioritization.”
He recalls a specific example, Cybertrust. “A Fortune 20 IT security manager wanted to spend $100M on 35 projects. The CIO [chief information officer] wanted to know which projects are more important. His people said nobody knows.”
Although it may seem cumbersome at first, Bayes’ theorem is one of the most powerful measurement tools at our disposal. It is the way it reframes the measurement question that makes it so useful. Given a particular observation, it may seem more obvious to frame a measurement by asking the question, “What can I conclude from this observation?” or, in probabilistic terms, “What is the probability X is true, given my observation?” But Bayes showed us that we could, instead, start with the question, “What is the probability of this observation if X were true?”
“Absence of evidence is not evidence of absence”
Is it really ever true that the absence of evidence is not evidence of absence?
Certainly, absence of evidence is not proof of absence,
Myth 2: Correlation Is Not Evidence of Causation
So, correlation does at least increase the probability of causation. Another common and unchallenged statement turns out to be false.
Think of it like this: If you knew nothing of my health other than that I smoked, would you assign a different 90% CI to my lifespan than if you only knew that I didn’t smoke? I can see someone saying, “Yes, but there are so many other factors that affect your life span.”
Broadly, there are two ways to observe preferences: what people say and what people do. Stated preferences are those that individuals will say they prefer. Revealed preferences are those that individuals display by their actual behaviors. Either type of preference can significantly reduce uncertainty, but revealed preferences are usually, as you might expect, more revealing.
The Likert scale. Respondents are asked to choose where they fall on a range of possible feelings about a thing, generally in the
form of “strongly dislike,” “dislike,” “strongly like,” “strongly disagree,” and “strongly agree.”
If the bias is done deliberately, the survey designer is angling for a specific response (e.g., “Do you oppose the criminal negligence of Governor . . . ?”), but questionnaires can be biased unintentionally.
Reverse questions to avoid response set bias.
Of course, we would expect that the choices for B and C would change with the new scale. But should the scale in Survey II make any difference to how often choice A is used? Choice A is identical between the two questionnaires, yet we would find the frequency of A being chosen would be lower in Survey II than Survey I.1 In this example, you could avoid partition dependence just by calibrating them and ask for an estimate of the actual quantity. If that was not practical, you might need to use more than one survey to minimize that effect.
The willingness to pay (WTP)
People are notoriously poor at understanding probabilities, especially the small ones that are relevant to health choices. In a general-population survey, only about 60% of respondents correctly answered the question “Which is a larger chance, 5 in 100,000 or 1 in 10,000?” This “innumeracy” can confound people’s thinking about their preferences.3
If people are really that mathematically illiterate, it would be fair to be skeptical about valuations gathered from public surveys.
It is not too bold a statement to say that a software development project is one of the riskiest investments a business makes. For example, the chance of a large software project being canceled increases with project duration. In the 1990s, those projects that exceeded two years of elapsed calendar time in development had a
default rate that exceeded the worst rated junk bonds (something over 25%).
The curves are drawn by management such that any two points on the same curve are considered equally valuable. For example, management drew the top curve in a way that indicates it
considers a worker who has 96% error-free work and a 96% on-time completion rate to be equal to one who has 93% error-free work and a 100% on-time completion rate. Keep in mind that this is just the hypothetical valuation of some particular manager, not a fixed, standard trade-off. Your preferences would probably be at least a little different.
Billy Bean, the manager of the Oakland A’s baseball team, decided to throw out traditional measures of performance for baseball players. The most important offensive measure of a player was simply the chance of not getting an out. Likewise, defensive measures were a sort of “out production.” Each of these contributed to the ultimate measure, which was the contribution a player made to the chance of the team winning a game relative to his salary. At a team level, this converts into a simple cost per win. By 2002, the Oakland A’s were spending only $500,000 per win, while some teams were
...more
In one experiment, the previously mentioned research team of Amos Tversky and 2002 Economics Nobel Prize winner Daniel Kahneman asked subjects about the percentage of member nations in the United Nations that were African. One group of subjects was asked whether it was more than 10%, and a second group was asked whether it was more than 65%. Both groups were told that the percentage in the “Is it more than . . . ?” question was randomly generated. (In fact, it was not.)
He asked subjects to write down the last four digits of their Social Security number and to estimate the number of physicians in New York City. Remarkably, Kahneman found a correlation of 0.4 between the subjects’ estimate of the number of physicians and the last four digits of their Social Security number. Although this is a modest correlation, it is much

