Everyone in development economics should read this paper
It is by Eva Vivalt and is called “How Much Can We Generalize from Impact Evaluations?” (pdf). The abstract is here:
Impact evaluations aim to predict the future, but they are rooted in particular con-
Impact evaluations aim to predict the future, but they are rooted in particular contexts and results may not generalize across settings. I founded an organization to systematically collect and synthesize impact evaluations results ona wide variety of interventions in development. these data allow me to answer this and other questions across a wide variety of interventions. I examine whether results predict each other whether variance in results can be explained by program characteristics, such as who is implementing them, where they are being implemented, the scale of the program, and what methods are used. I find that when regressing an estimate on the hierarchical Bayesian meta-analysis result formed from all other studies on the same intervention-outcome combination, the result is significant with a coefficient of 0.6-0.7, though the R-squared is very low. The program implementer is the main source of heterogeneity in results, with government-implemented programs faring worse than and being poorly predicted by the smaller studies typically implemented by academic/NGO research teams, even controlling for sample size. I then turn to examine specification searching and publication bias, issues which could affect generalizability and are also important for research credibility. I demonstrate that these biases are quite small; nevertheless, to address them, I discuss a mathematical correction that could be applied before showing that randomized control trials (RCTs) are less prone to this type of bias and exploiting them as a robustness check.
Eva is on the job market from Berkeley this year, her home page is here. Here is her paper “Peacekeepers Help, Governments Hinder” (pdf). Here is her extended bio.
Impact evaluations aim to predict the future, but they are rooted in particular con-
texts and results may not generalize across settings. I founded an organization to
systematically collect and synthesize impact evaluation results on a wide variety of in-
terventions in development. These data allow me to answer this and other questions
across a wide variety of interventions. I examine whether results predict each other and
whether variance in results can be explained by program characteristics, such as who is
implementing them, where they are being implemented, the scale of the program, and
what methods are used. I �nd that when regressing an estimate on the hierarchical
Bayesian meta-analysis result formed from all other studies on the same intervention-
outcome combination, the result is signi�cant with a coe�cient of 0.6-0.7, though the
R
2
is very low. The program implementer is the main source of heterogeneity in results,
with government-implemented programs faring worse than and being poorly predicted
by the smaller studies typically implemented by academic/NGO research teams, even
controlling for sample size. I then turn to examine speci�cation searching and publica-
tion bias, issues which could a�ect generalizability and are also important for research
credibility. I demonstrate that these biases are quite small; nevertheless, to address
them, I discuss a mathematical correction that could be applied before showing that
randomized controlled trials (RCTs) are less prone to this type of bias and exploiting
Impact evaluations aim to predict the future, but they are rooted in particular con-
texts and results may not generalize across settings. I founded an organization to
systematically collect and synthesize impact evaluation results on a wide variety of in-
terventions in development. These data allow me to answer this and other questions
across a wide variety of interventions. I examine whether results predict each other and
whether variance in results can be explained by program characteristics, such as who is
implementing them, where they are being implemented, the scale of the program, and
what methods are used. I �nd that when regressing an estimate on the hierarchical
Bayesian meta-analysis result formed from all other studies on the same intervention-
outcome combination, the result is signi�cant with a coe�cient of 0.6-0.7, though the
R
2
is very low. The program implementer is the main source of heterogeneity in results,
with government-implemented programs faring worse than and being poorly predicted
by the smaller studies typically implemented by academic/NGO research teams, even
controlling for sample size. I then turn to examine speci�cation searching and publica-
tion bias, issues which could a�ect generalizability and are also important for research
credibility. I demonstrate that these biases are quite small; nevertheless, to address
them, I discuss a mathematical correction that could be applied before showing that
randomized controlled trials (RCTs) are less prone to this type of bias and exploiting
Tyler Cowen's Blog
- Tyler Cowen's profile
- 844 followers
