Combining Panel Data Sets with Attrition and Refreshment Samples
In many fields researchers wish to consider statistical models that allow for more complex relationships than can be inferred using only cross-sectional data. Panel or longitudinal data where the same units are observed repeatedly at different points in time can often provide the richer data needed for such models. Although such data allows researchers to identify more complex models than cross-sectional data, missing data problems can be more severe in panels. In particular, even units who respond in initial waves of the panel may drop out in subsequent waves, so that the subsample with complete data for all waves of the panel can be less representative of the population than the original sample. Sometimes, in the hope of mitigating the effects of attrition without losing the advantages of panel data over cross-sections, panel data sets are augmented by replacing units who have dropped out with new units randomly sampled from the original population. Following Ridder (1992), who used these replacement units to test some models for attrition, we call such additional samples refreshment samples. We explore the benefits of these samples for estimating models of attrition. We describe the manner in which the presence of refreshment samples allows the researcher to test various models for attrition in panel data, including models based on the assumption that missing data are missing at random (MAR, Rubin, 1976; Little and Rubin, 1987). The main result in the paper makes precise the extent to which refreshment samples are informative about the attrition process; a class of non-ignorable missing data models can be identified without making strong distributional or functional form assumptions if refreshment samples are available.
Hirano, Keisuke, Guido W. Imbens, Geert Ridder and Donald B. Rubin. "Combining Panel Data Sets With Attrition And Refreshment Samples," Econometrica, 2001, v69(6,Dec), 1645-1659.