More on this book
Community
Kindle Notes & Highlights
by
Nate Silver
Read between
January 30 - April 26, 2019
One way to judge a forecast, Murphy wrote—perhaps the most obvious one—was through what he called “quality,” but which might be better defined as accuracy. That is, did the actual weather match the forecast?
A second measure was what Murphy labeled “consistency” but which I think of as honesty. However accurate the forecast turned out to be, was it the best one the forecaster was capable of at the time? Did it reflect her best judgment, or was it modified in some way before being presented to the public?
Finally, Murphy said, there was the economic value of a forecast. Did it help the public and policy ma...
This highlight has been truncated due to consecutive passage length restrictions.
The statistical reality of accuracy isn’t necessarily the governing paradigm when it comes to commercial weather forecasting. It’s more the perception of accuracy that adds value in the eyes of the consumer.
One of the most important tests of a forecast—I would argue that it is the single most important one44—is called calibration. Out of all the times you said there was a 40 percent chance of rain, how often did rain actually occur?
In the case of earthquakes, it turns out that for every increase of one point in magnitude, an earthquake becomes about ten times less frequent.
What happens in systems with noisy data and underdeveloped theory—like earthquake prediction and parts of economics and political science—is a two-step process. First, people start to mistake the noise for a signal. Second, this noise pollutes journals, blogs, and news accounts with false alarms, undermining good science and setting back our ability to understand how the system really works.
In statistics, the name given to the act of mistaking noise for a signal is overfitting.
You are most likely to overfit a model when the data is limited and noisy and when your understanding of the fundamental relationships is poor; both circumstances apply in earthquake forecasting.
“With four parameters I can fit an elephant,” the mathematician John von Neumann once said of this problem.59 “And with five I can make him wiggle his trunk.”
If you drop another grain of sand onto the pile (what could be simpler than a grain of sand?), it can actually do one of three things. Depending on the shape and size of the pile, it might stay more or less where it lands, or it might cascade gently down the small hill toward the bottom of the pile. Or it might do something else: if the pile is too steep, it could destabilize the entire system and trigger a sand avalanche. Complex systems seem to have this property, with large periods of apparent stasis marked by sudden and catastrophic failures.
An oft-told joke: a statistician drowned crossing a river that was only three feet deep on average.
Getting feedback about how well our predictions have done is one way—perhaps the essential way—to improve them. Economic forecasters get more feedback than people in most other professions, but they haven’t chosen to correct for their bias toward overconfidence.
As Hatzius sees it, economic forecasters face three fundamental challenges. First, it is very hard to determine cause and effect from economic statistics alone. Second, the economy is always changing, so explanations of economic behavior that hold in one business cycle may not apply to future ones. And third, as bad as their forecasts have been, the data that economists have to work with isn’t much good either.
A related doctrine known as Goodhart’s law, after the London School of Economics professor who proposed it,38 holds that once policy makers begin to target a particular variable, it may begin to lose its value as an economic indicator. For instance, if the government artificially takes steps to inflate housing prices, they might well increase, but they will no longer be good measures of overall economic health.
A forecaster should almost never ignore data, especially when she is studying rare events like recessions or presidential elections, about which there isn’t very much data to begin with. Ignoring data is often a tip-off that the forecaster is overconfident, or is overfitting her model—that she is interested in showing off rather than trying to be accurate.
if you just look at the economy as a series of variables and equations without any underlying structure, you are almost certain to mistake noise for a signal and may delude yourself (and gullible investors) into thinking you are making good forecasts when you are not.
There’s plenty of jargon, but what is lacking in this description is any actual economic substance. Theirs was a story about data—as though data itself caused recessions—and not a story about the economy.
This kind of statement is becoming more common in the age of Big Data.56 Who needs theory when you have so much information? But this is categorically the wrong attitude to take toward forecasting, especially in a field like economics where the data is so noisy. Statistical inferences are much stronger when backed up by theory or at least some deeper thinking about their root causes.
This property—group forecasts beat individual ones—has been found to be true in almost every field in which it has been studied.
this exact property has been identified in the Blue Chip forecasts:66 one study terms the phenomenon “rational bias.”67 The less reputation you have, the less you have to lose by taking a big risk when you make a prediction. Even if you know that the forecast is dodgy, it might be rational for you to go after the big score. Conversely, if you have already established a good reputation, you might be reluctant to step too far out of line even when you think the data demands it.
If we want to reduce these biases—we will never be rid of them entirely—we have two fundamental alternatives. One might be thought of as a supply-side approach—creating a market for accurate economic forecasts. The other might be a demand-side alternative: reducing demand for inaccurate and overconfident ones.
Hanson, in order to address this deficiency, is an advocate of prediction markets—systems where you can place bets on a particular economic or policy outcome, like whether Israel will go to war with Iran, or how much global temperatures will rise because of climate change. His argument for these is pretty simple: they ensure that we have a financial stake in being accurate when we make forecasts, rather than just trying to look good to our peers.
Extrapolation is a very basic method of prediction—usually, much too basic. It simply involves the assumption that the current trend will continue indefinitely, into the future. Some of the best-known failures of prediction have resulted from applying this assumption too liberally.
Extrapolation tends to cause its greatest problems in fields—including population growth and disease—where the quantity that you want to study is growing exponentially.
A case where a prediction can bring itself about is called a self-fulfilling prediction or a self-fulfilling prophecy. This can happen with the release of a political poll in a race with multiple candidates, such as a presidential primary.
More subtle examples of this involve fields like design and entertainment, where businesses are essentially competing with one another to predict the consumer’s taste—but also have some ability to influence it through clever marketing plans.
Diseases and other medical conditions can also have this self-fulfilling property.
self-canceling prediction is just the opposite: a case where a prediction tends to undermine itself. One interesting case is the GPS navigation systems that are coming into more and more common use.
Needlessly complicated models may fit the noise in a problem rather than the signal, doing a poor job of replicating its underlying structure and causing predictions to be worse.
Still, while simplicity can be a virtue for a model, a model should at least be sophisticatedly simple.82 Models like SIR, although they are useful for understanding disease, are probably too blunt to help predict its course.
Major diseases come around only every so often. And even if the models are right, they might be victims of their own success because of the self-canceling property of a successful disease prediction.
If you can’t make a good prediction, it is very often harmful to pretend that you can.
As the statistician George E. P. Box wrote, “All models are wrong, but some models are useful.”
Successful gamblers, instead, think of the future as speckles of probability, flickering upward and downward like a stock market ticker to every new jolt of information. When their estimates of these probabilities diverge by a sufficient margin from the odds on offer, they may place a bet.
The argument made by Bayes and Price is not that the world is intrinsically probabilistic or uncertain. Bayes was a believer in divine perfection; he was also an advocate of Isaac Newton’s work, which had seemed to suggest that nature follows regular and predictable laws. It is, rather, a statement—expressed both mathematically and philosophically—about how we learn about the universe: that we learn about it through approximation, getting closer and closer to the truth as we gather more evidence. This contrasted25 with the more skeptical viewpoint of the Scottish philosopher David Hume, who
...more
The idea behind frequentism is that uncertainty in a statistical problem results exclusively from collecting data among just a sample of the population rather than the whole population.
Essentially, the frequentist approach toward statistics seeks to wash its hands of the reason that predictions most often go wrong: human error. It views uncertainty as something intrinsic to the experiment rather than something intrinsic to our ability to understand the real world. The frequentist method also implies that, as you collect more data, your error will eventually approach zero: this will be both necessary and sufficient to solve any problems.
The bigger problem, however, is that the frequentist methods—in striving for immaculate statistical procedures that can’t be contaminated by the researcher’s bias—keep him hermetically sealed off from the real world. These methods discourage the researcher from considering the underlying context or plausibility of his hypothesis, something that the Bayesian method demands in the form of a prior probability. Thus, you will see apparently serious papers published on how toads can predict earthquakes,50 or how big-box stores like Target beget racial hate groups,51 which apply frequentist tests to
...more
Fisher’s notion of statistical significance, which uses arbitrary cutoffs devoid of context† to determine what is a “significant” finding and what isn’t,61 is much too clumsy for gambling.
does not imply that all prior beliefs are equally correct or equally valid. But I’m of the view that we can never achieve perfect objectivity, rationality, or accuracy in our beliefs. Instead, we can strive to be less subjective, less irrational, and less wrong. Making predictions based on our beliefs is the best (and perhaps even the only) way to test ourselves. If objectivity is the concern for a greater truth beyond our personal circumstances, and prediction is the best way to examine how closely aligned our personal perceptions are with that greater truth, the most objective among us are
...more
But chess computers had long been rather poor at the opening phase of the game. Although the number of possibilities was the most limitless, the objectives were also the least clear. When there are branches on the tree, calculating 3 moves per second or 200 million is about equally fruitless unless you are harnessing that power in a directed way.
The name for the curve comes from the well-known business maxim called the Pareto principle or 80-20 rule (as in: 80 percent of your profits come from 20 percent of your customers16). As I apply it here, it posits that getting a few basic things right can go a long way. In poker, for instance, simply learning to fold your worst hands, bet your best ones, and make some effort to consider what your opponent holds will substantially mitigate your losses.
The Pareto Principle of Prediction implies that the worst forecasters—those who aren’t getting even the first 20 percent right—are much worse than the best forecasters are good. Put another way, average forecasters are closer to the top than to the bottom of the pool.
Indeed, the big brokerage firms tend to avoid standing out from the crowd, downgrading a stock only after its problems have become obvious.
when it’s not your own money on the line but someone else’s, your incentives may change. Under some circumstances, in fact, it may be quite rational for traders to take positions that lose money for their firms and their investors if it allows them to stay with the herd and reduces their chance of getting fired.70 There is significant theoretical and empirical evidence71 for herding behavior among mutual funds and other institutional investors.72 “The answer as to why bubbles form,” Blodget told me, “is that it’s in everybody’s interest to keep markets going up.”
So long as most traders are judged on the basis of short-term performance, bubbles involving large deviations of stock prices from their long-term values are possible—and perhaps even inevitable.
This is herding. And there’s evidence that it’s becoming more and more common in markets. The correlations in the price movements between different stocks and different types of assets are becoming sharper and sharper,76 suggesting that everybody is investing in a little bit of everything and trying to exploit many of the same strategies.
There is reason to suspect that of the various cognitive biases that investors suffer from, overconfidence is the most pernicious. Perhaps the central finding of behavioral economics is that most of us are overconfident when we make predictions.
Markets with overconfident traders will produce extremely high trading volumes, increased volatility, strange correlations in stock prices from day to day, and below-average returns for active traders—all the things that we observe in the real world.