Imputation-Powered Inference for Missing Covariates
Missing covariate data is a prevalent problem in empirical research. We provide a novel framework for handling missing covariate data for estimation and inference in downstream tasks. Our general framework provides an automatic and easy-to-use pipeline for empirical researchers: First, missing values are imputed using virtually any imputation method under general observation patterns. Second, we automatically correct for the imputation bias and adaptively weight the imputed values according to their quality. Third, we use all available data, including imputed observations, to obtain more precise point estimates for the downstream task with valid confidence intervals. Our approach ensures valid inference while improving statistical efficiency by leveraging all available data. We establish the asymptotic normality of the proposed estimator under general missing data patterns and a broad class of imputation methods. Through simulations, we demonstrate the superior performance of our approach over natural benchmarks, as it achieves both lower bias and variance while being robust to imputation quality. In a comprehensive empirical study of the dependence of equity markets on carbon emissions, we show that properly accounting for missing emissions data yields no evidence of correlation between stock returns and emissions directly produced by companies, but a negative correlation with value chain emissions.
-
-
Copy CitationJunting Duan and Markus Pelger, "Imputation-Powered Inference for Missing Covariates," NBER Working Paper 34535 (2025), https://doi.org/10.3386/w34535.Download Citation