Mastering 'Metrics: The Path from Cause to Effect
Rate it:
Open Preview
Read between January 7 - February 14, 2023
20%
Flag icon
The first and most important step in our effort to isolate the serendipitous component of school choice is to hold constant the most obvious and important differences between students who go to private and state schools. In this manner, we hope (though cannot promise) to make other things equal.
22%
Flag icon
Groups A and B are where the action is in our example, since these groups include public and private school students who applied to and were admitted to the same set of schools. To generate a single estimate that uses all available data, we average the group-specific estimates. The average of − $5,000 for group A and $30, 000 for group B is $12, 500. This is a good estimate of the effect of private school attendance on average earnings, because, to a large degree, it controls for applicants’ choices and abilities.
22%
Flag icon
Evidence of selection bias emerges from a comparison of average earnings across (instead of within) groups A and B.
23%
Flag icon
Dummies, as they are called (no reference to ability here), classify data into simple yes-or-no categories.
23%
Flag icon
The regression parameters—called regression coefficients—
23%
Flag icon
The private school coefficient in this case is 10, 000, implying a private-public earnings differential of $10, 000. This is indeed a weighted average of our two group-specific effects (recall the group A effect is − 5,000 and the group B effect is 30, 000).
23%
Flag icon
Second, under some circumstances, regression estimates are efficient in the sense of providing the most statistically precise estimates of average causal effects that we can hope to obtain from a given sample.
24%
Flag icon
Like the standard errors for a difference in means discussed in the appendix to Chapter 1, these standard errors quantify the statistical precision of the regression estimates reported here. The standard error associated with the estimate in column (1) is .055. The fact that .135 is more than twice the size of the associated standard error of .055 makes it very unlikely the positive estimated private-school gap is merely a chance finding. The private school coefficient is statistically significant.
26%
Flag icon
Private university attendance seems unrelated to future earnings once we control for selection bias. But perhaps our focus on public-private comparisons misses the point. Students may benefit from attending schools like Ivy, Leafy, or Smart simply because their classmates at such schools are so much better. The synergy generated by a strong peer group may be the feature that justifies the private school price tag.
30%
Flag icon
Because we can never be sure whether a given set of controls is enough to eliminate selection bias, it’s important to ask how sensitive regression results are to changes in the list of controls. Our confidence in regression estimates of causal effects grows when treatment effects are insensitive—masters say “robust”—to whether a particular variable is added or dropped as long as a few core controls are always included in the model.
32%
Flag icon
Galton explained this averaging phenomenon in his celebrated 1886 paper “Regression towards Mediocrity in Hereditary Stature.”13 Today, we call this property “regression to the mean.” Regression to the mean is not a causal relationship. Rather, it’s a statistical property of correlated pairs of variables like the heights of fathers and sons. Although fathers’ and sons’ heights are never exactly the same, their frequency distributions are essentially unchanging. This distributional stability generates the Galton regression.
39%
Flag icon
The instrumental variables (IV) method harnesses partial or incomplete random assignment, whether naturally occuring or generated by researchers.
39%
Flag icon
two-stage least squares (2SLS),
40%
Flag icon
More than a colorful institutional detail, these lotteries allow us to untangle the charter school causality conundrum. Our IV tool uses these admissions lotteries to frame a naturally occurring randomized trial.
40%
Flag icon
IV turns randomized offer effects into causal estimates of the effect of charter attendance. Specifically, IV estimates capture causal effects on the sort of child who enrolls in KIPP when offered a seat in a lottery but wouldn’t manage to get in otherwise. As we explain below, this group is known as the set of KIPP lottery compliers.
42%
Flag icon
We’ve noted that the original randomizer (in this case, a KIPP offer) is called an instrumental variable or just an instrument for short. As we’ve seen, the link from the instrument to the causal variable of interest (in this case, the effect of lottery offers on KIPP attendance) is called the first-stage, because this is the first link in the chain. The direct effect of the instrument on outcomes, which runs the full length of the chain (in this case, the effect of offers on scores), is called the reduced form. Finally, the causal effect of interest—the second link in the chain—is determined ...more
43%
Flag icon
IV strategies depend on applicants like Camila, who are called compliers, a group we indicate with the dummy variable, Ci
43%
Flag icon
The term “compliers” comes from the world of randomized trials. In many randomized trials, such as those used to evaluate new drugs, the decision to comply with a randomized treatment assignment remains voluntary and nonrandom (experimental subjects who are randomly offered treatment may decline it, for example).
44%
Flag icon
The question of whether a particular causal estimate has predictive value for times, places, and people beyond those represented in the study that produced it is called external validity. When assessing external validity, masters must ask themselves why a particular LATE estimate is big or small.
44%
Flag icon
As with estimates from randomized trials, the best evidence for the external validity of IV estimates comes from comparisons of LATEs for the same or similar treatments across different populations.
45%
Flag icon
Unfortunately, domestic abuse is often a repeat offense, as can be seen in the fact the police were called for a second domestic violence intervention at 18% of the addresses in the MDVE sample.
45%
Flag icon
Most importantly from the point of view of MDVE researchers, recidivism was greater among suspects assigned to be coddled than among those assigned to be arrested.
51%
Flag icon
The foundation has three layers: (i) the first-stage requires instruments that affect the causal channel of interest; (ii) the independence assumption requires instruments to be as good as randomly assigned; (iii) the exclusion restriction asserts that a single causal channel connects instruments with outcomes.
51%
Flag icon
Check the first stage by looking for a strong relationship between instruments and the proposed causal channel; check independence by checking covariate balance with the instrument switched off and on, as in a randomized trial.
51%
Flag icon
The exclusion restriction is not easily verified. Sometimes, however, we may find a sample where the first stage is very small. Exclusion implies such samples should generate small reduced-form estimates, since the hypothesized causal channel is absent.
51%
Flag icon
Statistical software computes two-stage least squares estimates for us. This allows us to add covariates and use more than one instrument at a time. But we look at the first-stage and reduced-form estimates as well.
51%
Flag icon
Economists knew for sure only that the observed relationship between price and quantity fails to capture either supply or demand, and is somehow determined by both.
55%
Flag icon
Although many of these rules seem arbitrary, with little grounding in science or experience, we say: bring ’em on! For rules that constrain the role of chance in human affairs often generate interesting experiments. Masters of ’metrics exploit these experiments with a tool called the regression discontinuity (RD) design. RD doesn’t work for all causal questions, but it works for many. And when it does, the results have almost the same causal force as those from a randomized trial.
56%
Flag icon
RD is based on the seemingly paradoxical idea that rigid rules—which at first appear to reduce or even eliminate the scope for randomness—create valuable experiments.
56%
Flag icon
The variable that determines treatment, age in this case, is called the running variable.
57%
Flag icon
Unlike the matching and regression strategies discussed in Chapter 2, which are based on treatment-control comparisons conditional on covariate values, the validity of RD turns on our willingness to extrapolate across values of the running variable, at least for values in the neighborhood of the cutoff at which treatment switches on.
57%
Flag icon
The jump in trend lines at the MLDA cutoff implicitly compares death rates for people on either side of—but close to—a twenty-first birthday. In other words, the notional experiment here involves changes in access to alcohol for young people, in a world where alcohol is freely available to adults.
57%
Flag icon
RD tools aren’t guaranteed to produce reliable causal estimates.
57%
Flag icon
Two strategies reduce the likelihood of RD mistakes, though neither provides perfect insurance. The first models nonlinearities directly, while the second focuses solely on observations near the cutoff. We start with the nonlinear modeling strategy, briefly taking up the second approach at the end of this section.
60%
Flag icon
The goal here is not so much to find the one perfect bandwidth as to show that the findings generated by any particular choice of bandwidth are not a fluke.
60%
Flag icon
Just as many American high school seniors compete to enroll in the country’s most selective colleges and universities, younger students and their parents in a few cities aspire to coveted seats at top exam schools. Fewer than half of Boston’s exam school applicants win a seat at the John D. O’Bryant School, Boston Latin Academy, or the Boston Latin School (BLS); only one-sixth of New York applicants are offered a seat at one of the three original exam schools in the Big Apple (Stuyvesant, Bronx Science, and Brooklyn Tech).
60%
Flag icon
When schools admit only high achievers, then the students who go there are necessarily high achievers, regardless of whether the school itself adds value. This sounds like a case of selection bias, and it is.
61%
Flag icon
Such dramatic variation in treatment intensity lies at the heart of any fuzzy RD research design. The difference between fuzzy and sharp designs is that, with fuzzy, applicants who cross a threshold are exposed to a more intense treatment, while in a sharp design treatment switches cleanly on or off at the cutoff.
62%
Flag icon
The RD design exploits abrupt changes in treatment status that arise when treatment is determined by a cutoff.
62%
Flag icon
RD requires us to know the relationship between the running variable and potential outcomes in the absence of treatment. We must control for this relationship when using discontinuities to identify causal effects. Randomized trials require no such control.
62%
Flag icon
One can’t be sure, Master. But our confidence in causal conclusions increases when RD estimates remain similar as we change details of the RD model.
62%
Flag icon
Sharp is when treatment itself switches on or off at a cutoff. Fuzzy is when the probability or intensity of treatment jumps. In fuzzy designs, a dummy for clearing the cutoff becomes an instrument; the fuzzy design is analyzed by 2SLS.
64%
Flag icon
The differences-in-differences (DD) method recognizes that in the absence of random assignment, treatment and control groups are likely to differ for many reasons.
64%
Flag icon
Although financial markets today are more sophisticated, the pillars of finance remain much as they were: banks borrow and lend, typically at different maturities, and bet on being able to raise the cash (known in banking jargon as “liquidity”) needed to cover liabilities as they come due.
64%
Flag icon
But who’s to say when a crisis is merely a crisis of confidence? Some crises are real. Bank balance sheets may be so sickened by bad debts that no amount of temporary liquidity support will cure ’em.
65%
Flag icon
This version of the DD calculation subtracts the pre-treatment difference between the Sixth and Eighth Districts from the post-treatment difference, thereby adjusting for the fact that the two districts weren’t the same initially. DD estimates suggest that lending to troubled banks kept many of them open. Specifically, the Atlanta Fed appears to have saved 19 banks—more than 10% of those operating in Mississippi’s Sixth District in 1930.
65%
Flag icon
FIGURE 5.1 Bank failures in the Sixth and Eighth Federal Reserve Districts
65%
Flag icon
The DD counterfactual comes from a strong but easily stated assumption: common trends.
65%
Flag icon
The dashed line depicts the counterfactual evolution of the number of banks in the Sixth District if the same number of banks had failed in that district after 1930 as did in the Eighth.
66%
Flag icon
The simplest DD calculation involves only four numbers, as in equations (5.1) and (5.2). In practice, however, the DD recipe is best cooked with regression models fit to samples of more than four data points, such as the 12 points plotted in Figure 5.2
« Prev 1 2 Next »