Call Us: US - +1 845 478 5244 | UK - +44 20 7193 7850 | AUS - +61 2 8005 4826

Numerical Computation

At What Level to Cluster? A practical question which arises in the context of cluster-robust inference is ìAt what level should we cluster?î In some examples you could cluster at a very Öne level, such as families or classrooms, or at higher levels of aggregation, such as neighborhoods, schools, towns, counties, or states. What is the correct level at which to cluster? Rules of thumb have been advocated by practitioners, but at present there is little formal analysis to provide useful guidance. What do we know? First, suppose cluster dependence is ignored or imposed at too Öne a level (e.g. clustering by households instead of villages). Then variance estimators will be biased as they will omit covariance terms. As correlation is typically positive, this suggests that standard errors will be too small, giving rise to spurious indications of signiÖcance and precision. Second, suppose cluster dependence is imposed at too aggregate a measure (e.g. clustering by states rather than villages). This does not cause bias. But the variance estimators will contain many extra components, so the precision of the covariance matrix estimator will be poor. This means that reported standard errors will be imprecise ñmore random ñthan if clustering had been less aggregate. These considerations show that there is a trade-o§ between bias and variance in the estimation of the covariance matrix by cluster-robust methods. It is not at all clear ñbased on current theory ñwhat to do. I state this emphatically. We really do not know what is the ìcorrectîlevel at which to do cluster-robust inference. This is a very interesting question and should certainly be explored by econometric research. One challenge is that in empirical practice, many people have observed: ìClustering is impor- CHAPTER 4. LEAST SQUARES REGRESSION 138 tant. Standard errors change a lot whether or not we properly cluster. Therefore we should only report clustered standard errors.îThe áaw in this reasoning is that we do not know why in a speciÖc empirical example the standard errors change under clustering. One possibility is that clustering reduces bias and thus is more accurate. The other possibility is that clustering adds sampling noise and is thus less accurate. In reality it is likely that both factors are present. In any event a researcher should be aware of the number of clusters used in the reported calculations and should treat the number of clusters as the e§ective sample size for assessing inference. If the number of clusters is, say, G = 20, this should be treated as a very small sample. To illustrate the thought experiment, consider the empirical example of Duáo, Dupas and Kremer (2011). They reported standard errors clustered at the school level, and the application uses 111 schools. Thus G = 111 which we can treat as the e§ective sample size. The number of observations (students) ranges from 19 to 62, which is reasonably homogeneous. This seems like a well balanced application of clustered variance estimation. However, one could imagine clustering at a di§erent level of aggregation. In some applications we might consider clustering at a less aggregate level such as the classroom level. This is not relevant in this particular application as there was only one classroom per school. We might consider consider clustering at a more aggregate level. The data set contains information on the school district, division, and zone. However, there are only 2 districts, 7 divisions, and 9 zones. Thus if we cluster by zone, G = 9 is the e§ective sample size which would lead to imprecise standard errors. In this particula