Error Components in Grouped Data: Why It's Never Worth Weighting
When estimating linear models using grouped data researchers typically weight each observation by the group size. Under the assumption that the regression errors for the underlying micro data have expected values of zero, are independent and are homoscedastic, this procedure produces best linear unbiased estimates. This note argues that for most applications in economics the assumption that errors are independent within groups is inappropriate. Since grouping is commonly done on the basis of common observed characteristics, it is inappropriate to assume that there are no unobserved characteristics in common. If group members have unobserved characteristics in common, individual errors will be correlated. If errors are correlated within groups and group sizes are large then heteroscedasticity may be relatively unimportant and weighting by group size may exacerbate heteroscedasticity rather than eliminate it. Two examples presented here suggest that this may be the effect of weighting in most non-experimental applications. In many situations unweighted ordinary least squares may be a preferred alternative. For those cases where it is not, a maximum likelihood and an asymptotically efficient two-step generalized least squares estimator are proposed. An extension of the two-step estimator for grouped binary data is also presented.