Sensitivity to Missing Data Assumptions: Theory and An Evaluation of the U.S. Wage Structure
This paper develops methods for assessing the sensitivity of empirical conclusions regarding conditional distributions to departures from the missing at random (MAR) assumption. We index the degree of non-ignorable selection governing the missingness process by the maximal Kolmogorov-Smirnov (KS) distance between the distributions of missing and observed outcomes across all values of the covariates. Sharp bounds on minimum mean square approximations to conditional quantiles are derived as a function of the nominal level of selection considered in the sensitivity analysis and a weighted bootstrap procedure is developed for conducting inference. Using these techniques, we conduct an empirical assessment of the sensitivity of observed earnings patterns in U.S. Census data to deviations from the MAR assumption. We find that the well-documented increase in the returns to schooling between 1980 and 1990 is relatively robust to deviations from the missing at random assumption except at the lowest quantiles of the distribution, but that conclusions regarding heterogeneity in returns and changes in the returns function between 1990 and 2000 are very sensitive to departures from ignorability.
Published: Patrick Kline & Andres Santos, 2013. "Sensitivity to missing data assumptions: Theory and an evaluation of the U.S. wage structure," Quantitative Economics, Econometric Society, vol. 4(2), pages 231-267, 07.
An online appendix is available for this publication.
This paper was revised on December 5, 2011