*x*causes another variable

*y*, and test this hypothesis by running a regression of

*y*on

*x*plus a huge number of fixed effects to control for "unobserved heterogeneity" or deal with "omitted variable bias." I've done a fair amount of work like this myself. The standard model is:

y_i,t = x_i,t + a_i + b_t + u_i,t

where a_i are fixed effects that span the cross section, b_t are fixed effects that span the time series, and u_i,t is the model error, which we hope is not associated with the causal variable x_i,t, once a_i

If you're really clever, you can find geographic or other kinds of groupings of individuals, like counties, and include group-by-year fixed effects:

y_i,t = x_i,t + a_i + b_g,t + u_i,t

The generalizable point of my lengthy post the other day on storage and agricultural impacts of climate change, was that this approach, while useful in some contexts, can have some big drawbacks. Increasingly, I fear applied econometricians misuse it. They found their hammer and now everything is a nail.

What's wrong with fixed effects?

A practical problem with fixed effects gone wild is that they generally purge the data set of most variation. This may be useful if you hope to isolate some interesting localized variation that you can argue is exogenous. But if the most interesting variation derives from a broader phenomenon, then there may be too little variation left over to identify an interesting effect.

A corollary to this point is that fixed effects tend to exaggerate attenuation bias of measurement errors since they will comprise a much larger share of the overall variation in x after fixed effects have been removed.

But there is a more fundamental problem. To see this, take a step back and think generically about economics. In economics, almost everything affects everything else, via prices and other kinds of costs and benefits. Micro incentives affect choices, and those choices add up to affect prices, cost and benefits more broadly, and thus help to organize the ordinary business of life. That's the essence of Adam's Smith's "invisible hand," supply and demand, and equilibrium theory, etc. That insight, a unifying theoretical theme if there is one in economics, implies a fundamental connectedness of human activities over time and space. It's not just that there are unobserved correlated factors; everything literally affects everything else. On some level it's what connects us to ecologists, although some ecologists may be loath to admit an affinity with economics.

In contrast to the nature of economics, regression with fixed effects is a tool designed for experiments with repeated measures. Heterogeneous observational units get different treatments, and they might be mutually affected by some outside factor, but the observational units don't affect each other. They are, by assumption, siloed, at least with respect to consequences of the treatment (whatever your

I'll put it another way. Suppose your (hopefully) exogenous variable of choice is

None of this is to say that fixed effects, with careful account of correlated unobserved factors, can be very useful in many settings. But the inferences we draw may be very limited. And without care, we may draw conclusions that are very misleading.

In contrast to the nature of economics, regression with fixed effects is a tool designed for experiments with repeated measures. Heterogeneous observational units get different treatments, and they might be mutually affected by some outside factor, but the observational units don't affect each other. They are, by assumption, siloed, at least with respect to consequences of the treatment (whatever your

*x*is). This design doesn't seem well suited to many kinds of observational data.I'll put it another way. Suppose your (hopefully) exogenous variable of choice is

*x*, and*x*causes*z*, and then both*x*and*z*affect*y*. Further, suppose the effects of*x*on*z*spill outside of the confines of your fixed-effects units. Even if fixed effects don't purge all the variation in*x*, they may purge much of the path going from x to*z*and*z*to*y*, thereby biasing the reduced form link between*x*and*y*. In other words, fixed effects are endogenous.None of this is to say that fixed effects, with careful account of correlated unobserved factors, can be very useful in many settings. But the inferences we draw may be very limited. And without care, we may draw conclusions that are very misleading.

I'm an old dog trying to understand some of the new econometric tricks. Thinking about quasi-experiments is to me like learning a foreign language. Would it be possible for you to give an example, to make this a bit more concrete?

ReplyDeleteAnother belated reply: I think the best place to learn about the idea of quasi-experiments, and how to begin thinking critically about statistical analysis of observational data, is to read a few papers by the late David Freedman. This is a nice summary: http://www.stat.berkeley.edu/~census/521.pdf

DeleteHi Anonymous: I've kind of got this backwards. The example came first, in the earlier post. Click on "lengthy post" just after the second equation. You may want to read the associated paper, comment and reply in the AER. What I'm trying to do here is generalize the insights I gained from our little debate.

ReplyDelete