Adjusting Imperfect Data: Overview and Case Studies
NBER Working Paper No. 12977
Research users of large administrative have to adjust their data for quirks, problems, and issues that are inevitable when working with these kinds of datasets. Not all solutions to these problems are identical, and how they differ may affect how the data is to be interpreted. Some elements of the data, such as the unit of observation, remain fundamentally different, and it is important to keep that in mind when comparing data across countries. In this paper (written for Lazear and Shaw, 2007), we focus on the differences in the underlying data for a selection of country datasets. We describe two data elements that remain fundamentally different across countries -- the sampling or data collection methodology, and the basic unit of analysis (establishment or firm) -- and the extent to which they differ. We then proceed to document some of the problems that affect longitudinally linked administrative data in general, and we describe some of the solutions analysts and statistical agencies have implemented, and explore, through a select set of case studies, how each adjustment or absence thereof might affect the data.