More on this book
Community
Kindle Notes & Highlights
“measurement mentors”—
Our first mentor of measurement did something that was probably thought by many in his day to be impossible. An ancient Greek named Eratosthenes (ca. 276–194 b.c.) made the first recorded measurement of the circumference of Earth. If he sounds familiar, it might be because he is mentioned in many high school trigonometry and geometry textbooks.
Using geometry, he could then prove that this meant that the circumference of Earth must be 50 times the distance between Alexandria and Syene. Modern attempts to replicate Eratosthenes’s calculations vary in terms of the exact size of the angles, conversion rates between ancient and modern units of measurement, and the precise distance between the ancient cities, but typical estimates put his answer within 3% of the actual value.
Even 1,700 years later, Columbus was apparently unaware of
or ignored Eratosthenes’s result; his estimate was fully 25% short. (This is one of the reasons Columbus thought he might be in India, not another large, intervening landmass where I reside.)
The best-known example of such a “Fermi question” was Fermi asking his students to estimate the number of piano tuners in Chicago.
This approach to solving a Fermi question is known as a Fermi decomposition or Fermi solution. This method helped to estimate the uncertain quantity but also gave the estimator a basis for seeing where uncertainty about the quantity came from. Was the big uncertainty about the share of households that had tuned pianos, how often a piano needed to be tuned, how many pianos a tuner can tune in a day, or something else? The biggest source of uncertainty would point toward a measurement that would reduce the uncertainty the most.
The concept of measurement as “uncertainty reduction” and not necessarily the elimination of uncertainty is a central theme of this book.
Definition of Measurement Measurement: A quantitatively expressed reduction of uncertainty based on one or more observations.
treat measurement as observations that
quantitatively reduce uncertainty.
This view of probabilities is called the “Bayesian” interpretation.
as Bayes’ theorem, describes how new information can update prior probabilities. “Prior” could refer to a state of uncertainty informed mostly by previously recorded data but it can also refer to a point before any objective and recorded observations. At least for the latter case, the prior probability often needs to be subjective.
A rational person can’t simply say, for instance, that there is a 75% chance of winning a bid for a government contract and an 82% chance of losing it (these two possibilities should have a total probability of 100%).
Once managers figure out what they mean and why it matters, the issue in question starts to look a lot more measurable.
The clarification chain is just a short series of connections that should bring us from thinking of something as an intangible to thinking of it as tangible.
I]f a thing exists, it exists in some amount, if it exists in some amount, it can be measured” (as quoted by the psychologist Paul Meehl3).
Clarification Chain If it matters at all, it is detectable/observable. If it is detectable, it can be detected as an amount (or range of possible amounts). If it can be detected as a range of possible amounts, it can be measured.
But when I ask why they care about measuring that, I might find that what they really are interested in is building a business case for a specific biometric identification system for criminals.
Identifying the object of measurement really is the beginning of almost any scientific inquiry, including the truly revolutionary ones. Business managers need to realize that some things seem intangible only because they just haven’t defined what they are talking about. Figure out what you mean and you are halfway to measuring it.
“pertaining to the state.” Statistics was literally the quantitative study of the state.
can be anything between 0% and 100%, there is a 75% chance that the characteristic you observe in that sample is the same as the majority. Let’s call this the “Single Sample Majority Rule” or, if you prefer something more fanciful, “The Urn of Mystery Rule.”
The Single Sample Majority Rule (i.e., The Urn of Mystery Rule) Given maximum uncertainty about a population proportion—such that you believe the proportion could be anything between 0% and 100% with all values being equally likely—there is a 75% chance that a single randomly selected sample is from the majority of the population.
Another researcher conducted one of the largest and longest running studies of the performance of experts at predictions. Philip Tetlock tracked the forecasts of 284 experts in many topics over a 20-year period. In total, he had gathered more than 82,000 individual forecasts covering elections, wars, economics, and more. Tetlock summarized these findings in his book Expert Political Judgment: How Good Is It? How Can We Know?13 His conclusion was perhaps even more strongly worded than Meehl’s:
It is impossible to find any domain in which humans clearly outperformed crude extrapolation algorithms, less still sophisticated statistical ones.
In his modified version there are two revolvers: one with one bullet and five empty chambers and one with five bullets and one empty chamber. Meehl then asks us to imagine that he is a “sadistic decision-theorist” running experiments in a detention camp. Meehl asks, “Which revolver would you choose under these circumstances? Whatever may be the detailed, rigorous, logical reconstruction of your reasoning processes, can you honestly say that you would let me pick the gun or that you would flip a coin to decide between them?”
“stats mongers” in baseball saying that the introduction of what he called “new age stats” actually “threatens to undermine most fans’ enjoyment of baseball and the human factor therein.”
Should a 99-year-old with several health problems be worth the same effort to save as a 5-year-old? Whatever your answer is, it is a measurement of the relative value you hold for each.
The uniqueness fallacy seems like the
The Cleveland Orchestra was just a bit more resourceful with the data available: It started counting the number of standing ovations. While there is no obvious difference among performances that differ by a couple of standing ovations, if we see a significant increase over several performances with a new conductor, then we can draw some useful conclusions about that new conductor.
In fact, it was Fisher who coined the term “Bayesian” as a derogatory reference to proponents of the contrary view.
Armstrong and MacGregor found that decomposition didn’t help much if the estimates of the first group already had relatively little error—like estimating the circumference of a U.S. $.50 coin in inches.
They found that for the most uncertain variables a simple decomposition—none of which was more than five variables—reduced error by a factor as much as 10 or even 100.
Decomposition itself is certainly worth the time.
“Essentially, all models are wrong, but some are useful.”
The first question I asked the VA is similar to the first questions I ask on most measurement problems: “What decision is this measurement for?” Then, shortly thereafter, I ask: “What do you mean by ‘IT security’? What does improved IT security look like?
How about a data center being hit by a fire, flood, or tornado? At the first meeting, participants found that while they all thought IT security could be better, they didn’t have a common understanding of exactly what IT security was.
had thought about those details in the definition of IT security.
They resolved that improved IT security means a reduction in the frequency and severity of a specific list of undesirable events.
of how to measure it.
You might take issue with some aspects of the definition. You may (justifiably) argue that a fire is not, strictly speaking, an IT security risk. Yet the VA participants determined that, within their organization, they did mean to include the risk of fire. Aside from some minor differences about what to include on the periphery, I think what we developed is the basic model for any IT security measurements.
In statistics, a range that has a particular chance of containing the correct answer is called a confidence
interval (CI). A 90% CI is a range that has a 90%
What we need to do is to determine how lucky or how unlucky these results would be. To compute the chance of various outcomes we can use what is known as a “binomial” distribution calculation, which we will cover in Chapters 9 and 10.
Therefore, it is reasonable to infer that a person who does this poorly is overconfident.
We also see that even out of all the tests, it is highly unlikely that anybody should have gotten 3 or less out of 10 and yet we see that this was the case 224 times
Unlike the 90% CI test, your confidence for each answer on this test could range from 50% to 100%. If you said you were 100% confident on all 10 questions, you are expecting to get all 10 correct.
If you said you were 100% confident on any answer, you must get it right. Getting even one 100% confident answer wrong is sufficient evidence that you are overconfident. Remarkably, just over 15% of responses where the stated confidence was 100% turned out to be wrong and some individuals (we will see shortly) would get a third or more of their “certain” answers wrong.
Apparently, even a 100% confidence, on average, indicates something less than an 85% chance of being right.