Nicolette’s Kindle Notes & Highlights for Mastering 'Metrics: The Path from Cause to Effect

The first and most important step in our effort to isolate the serendipitous component of school choice is to hold constant the most obvious and important differences between students who go to private and state schools. In this manner, we hope (though cannot promise) to make other things equal.

22%

Groups A and B are where the action is in our example, since these groups include public and private school students who applied to and were admitted to the same set of schools. To generate a single estimate that uses all available data, we average the group-specific estimates. The average of − $5,000 for group A and $30, 000 for group B is $12, 500. This is a good estimate of the effect of private school attendance on average earnings, because, to a large degree, it controls for applicants’ choices and abilities.

22%

Evidence of selection bias emerges from a comparison of average earnings across (instead of within) groups A and B.

23%

Dummies, as they are called (no reference to ability here), classify data into simple yes-or-no categories.

23%

The regression parameters—called regression coefficients—

23%

The private school coefficient in this case is 10, 000, implying a private-public earnings differential of $10, 000. This is indeed a weighted average of our two group-specific effects (recall the group A effect is − 5,000 and the group B effect is 30, 000).

23%

Second, under some circumstances, regression estimates are efficient in the sense of providing the most statistically precise estimates of average causal effects that we can hope to obtain from a given sample.

24%

Like the standard errors for a difference in means discussed in the appendix to Chapter 1, these standard errors quantify the statistical precision of the regression estimates reported here. The standard error associated with the estimate in column (1) is .055. The fact that .135 is more than twice the size of the associated standard error of .055 makes it very unlikely the positive estimated private-school gap is merely a chance finding. The private school coefficient is statistically significant.

26%

Private university attendance seems unrelated to future earnings once we control for selection bias. But perhaps our focus on public-private comparisons misses the point. Students may benefit from attending schools like Ivy, Leafy, or Smart simply because their classmates at such schools are so much better. The synergy generated by a strong peer group may be the feature that justifies the private school price tag.

30%

Because we can never be sure whether a given set of controls is enough to eliminate selection bias, it’s important to ask how sensitive regression results are to changes in the list of controls. Our confidence in regression estimates of causal effects grows when treatment effects are insensitive—masters say “robust”—to whether a particular variable is added or dropped as long as a few core controls are always included in the model.

32%

Galton explained this averaging phenomenon in his celebrated 1886 paper “Regression towards Mediocrity in Hereditary Stature.”13 Today, we call this property “regression to the mean.” Regression to the mean is not a causal relationship. Rather, it’s a statistical property of correlated pairs of variables like the heights of fathers and sons. Although fathers’ and sons’ heights are never exactly the same, their frequency distributions are essentially unchanging. This distributional stability generates the Galton regression.

39%

The instrumental variables (IV) method harnesses partial or incomplete random assignment, whether naturally occuring or generated by researchers.

39%

two-stage least squares (2SLS),

40%

More than a colorful institutional detail, these lotteries allow us to untangle the charter school causality conundrum. Our IV tool uses these admissions lotteries to frame a naturally occurring randomized trial.

40%

IV turns randomized offer effects into causal estimates of the effect of charter attendance. Specifically, IV estimates capture causal effects on the sort of child who enrolls in KIPP when offered a seat in a lottery but wouldn’t manage to get in otherwise. As we explain below, this group is known as the set of KIPP lottery compliers.

42%

We’ve noted that the original randomizer (in this case, a KIPP offer) is called an instrumental variable or just an instrument for short. As we’ve seen, the link from the instrument to the causal variable of interest (in this case, the effect of lottery offers on KIPP attendance) is called the first-stage, because this is the first link in the chain. The direct effect of the instrument on outcomes, which runs the full length of the chain (in this case, the effect of offers on scores), is called the reduced form. Finally, the causal effect of interest—the second link in the chain—is determined ...more

43%

IV strategies depend on applicants like Camila, who are called compliers, a group we indicate with the dummy variable, Ci

43%

The term “compliers” comes from the world of randomized trials. In many randomized trials, such as those used to evaluate new drugs, the decision to comply with a randomized treatment assignment remains voluntary and nonrandom (experimental subjects who are randomly offered treatment may decline it, for example).

44%

The question of whether a particular causal estimate has predictive value for times, places, and people beyond those represented in the study that produced it is called external validity. When assessing external validity, masters must ask themselves why a particular LATE estimate is big or small.

44%

As with estimates from randomized trials, the best evidence for the external validity of IV estimates comes from comparisons of LATEs for the same or similar treatments across different populations.

45%

Unfortunately, domestic abuse is often a repeat offense, as can be seen in the fact the police were called for a second domestic violence intervention at 18% of the addresses in the MDVE sample.

45%

Most importantly from the point of view of MDVE researchers, recidivism was greater among suspects assigned to be coddled than among those assigned to be arrested.

51%

The foundation has three layers: (i) the first-stage requires instruments that affect the causal channel of interest; (ii) the independence assumption requires instruments to be as good as randomly assigned; (iii) the exclusion restriction asserts that a single causal channel connects instruments with outcomes.

51%

Check the first stage by looking for a strong relationship between instruments and the proposed causal channel; check independence by checking covariate balance with the instrument switched off and on, as in a randomized trial.

51%

The exclusion restriction is not easily verified. Sometimes, however, we may find a sample where the first stage is very small. Exclusion implies such samples should generate small reduced-form estimates, since the hypothesized causal channel is absent.

51%

Statistical software computes two-stage least squares estimates for us. This allows us to add covariates and use more than one instrument at a time. But we look at the first-stage and reduced-form estimates as well.

51%

Economists knew for sure only that the observed relationship between price and quantity fails to capture either supply or demand, and is somehow determined by both.

55%

Although many of these rules seem arbitrary, with little grounding in science or experience, we say: bring ’em on! For rules that constrain the role of chance in human affairs often generate interesting experiments. Masters of ’metrics exploit these experiments with a tool called the regression discontinuity (RD) design. RD doesn’t work for all causal questions, but it works for many. And when it does, the results have almost the same causal force as those from a randomized trial.

56%

RD is based on the seemingly paradoxical idea that rigid rules—which at first appear to reduce or even eliminate the scope for randomness—create valuable experiments.

56%

The variable that determines treatment, age in this case, is called the running variable.

57%

Unlike the matching and regression strategies discussed in Chapter 2, which are based on treatment-control comparisons conditional on covariate values, the validity of RD turns on our willingness to extrapolate across values of the running variable, at least for values in the neighborhood of the cutoff at which treatment switches on.

57%

The jump in trend lines at the MLDA cutoff implicitly compares death rates for people on either side of—but close to—a twenty-first birthday. In other words, the notional experiment here involves changes in access to alcohol for young people, in a world where alcohol is freely available to adults.

57%

RD tools aren’t guaranteed to produce reliable causal estimates.

57%

Two strategies reduce the likelihood of RD mistakes, though neither provides perfect insurance. The first models nonlinearities directly, while the second focuses solely on observations near the cutoff. We start with the nonlinear modeling strategy, briefly taking up the second approach at the end of this section.

60%

The goal here is not so much to find the one perfect bandwidth as to show that the findings generated by any particular choice of bandwidth are not a fluke.

60%

Just as many American high school seniors compete to enroll in the country’s most selective colleges and universities, younger students and their parents in a few cities aspire to coveted seats at top exam schools. Fewer than half of Boston’s exam school applicants win a seat at the John D. O’Bryant School, Boston Latin Academy, or the Boston Latin School (BLS); only one-sixth of New York applicants are offered a seat at one of the three original exam schools in the Big Apple (Stuyvesant, Bronx Science, and Brooklyn Tech).

60%

When schools admit only high achievers, then the students who go there are necessarily high achievers, regardless of whether the school itself adds value. This sounds like a case of selection bias, and it is.

61%

Such dramatic variation in treatment intensity lies at the heart of any fuzzy RD research design. The difference between fuzzy and sharp designs is that, with fuzzy, applicants who cross a threshold are exposed to a more intense treatment, while in a sharp design treatment switches cleanly on or off at the cutoff.

62%

The RD design exploits abrupt changes in treatment status that arise when treatment is determined by a cutoff.

62%

RD requires us to know the relationship between the running variable and potential outcomes in the absence of treatment. We must control for this relationship when using discontinuities to identify causal effects. Randomized trials require no such control.

62%

One can’t be sure, Master. But our confidence in causal conclusions increases when RD estimates remain similar as we change details of the RD model.

62%

Sharp is when treatment itself switches on or off at a cutoff. Fuzzy is when the probability or intensity of treatment jumps. In fuzzy designs, a dummy for clearing the cutoff becomes an instrument; the fuzzy design is analyzed by 2SLS.

64%

The differences-in-differences (DD) method recognizes that in the absence of random assignment, treatment and control groups are likely to differ for many reasons.

64%

Although financial markets today are more sophisticated, the pillars of finance remain much as they were: banks borrow and lend, typically at different maturities, and bet on being able to raise the cash (known in banking jargon as “liquidity”) needed to cover liabilities as they come due.

64%

But who’s to say when a crisis is merely a crisis of confidence? Some crises are real. Bank balance sheets may be so sickened by bad debts that no amount of temporary liquidity support will cure ’em.

65%

This version of the DD calculation subtracts the pre-treatment difference between the Sixth and Eighth Districts from the post-treatment difference, thereby adjusting for the fact that the two districts weren’t the same initially. DD estimates suggest that lending to troubled banks kept many of them open. Specifically, the Atlanta Fed appears to have saved 19 banks—more than 10% of those operating in Mississippi’s Sixth District in 1930.

65%

FIGURE 5.1 Bank failures in the Sixth and Eighth Federal Reserve Districts

65%

The DD counterfactual comes from a strong but easily stated assumption: common trends.

65%

The dashed line depicts the counterfactual evolution of the number of banks in the Sixth District if the same number of banks had failed in that district after 1930 as did in the Eighth.

66%

The simplest DD calculation involves only four numbers, as in equations (5.1) and (5.2). In practice, however, the DD recipe is best cooked with regression models fit to samples of more than four data points, such as the 12 points plotted in Figure 5.2

See a Problem?

Preview — Mastering 'Metrics by Joshua D. Angrist