Double/Debiased Machine Learning for Treatment and Structural Parameters

Victor Chernozhukov; Denis Chetverikov; Mert Demirer; Esther Duflo; Christian Hansen; Whitney Newey; James Robins

doi:10.3386/w23564

Double/Debiased Machine Learning for Treatment and Structural Parameters

Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey & James Robins

Working Paper 23564

DOI 10.3386/w23564

Issue Date June 2017

We revisit the classic semiparametric problem of inference on a low dimensional parameter θ_0 in the presence of high-dimensional nuisance parameters η_0. We depart from the classical setting by allowing for η_0 to be so high-dimensional that the traditional assumptions, such as Donsker properties, that limit complexity of the parameter space for this object break down. To estimate η_0, we consider the use of statistical or machine learning (ML) methods which are particularly well-suited to estimation in modern, very high-dimensional cases. ML methods perform well by employing regularization to reduce variance and trading off regularization bias with overfitting in practice. However, both regularization bias and overfitting in estimating η_0 cause a heavy bias in estimators of θ_0 that are obtained by naively plugging ML estimators of η_0 into estimating equations for θ_0. This bias results in the naive estimator failing to be N^(-1/2) consistent, where N is the sample size. We show that the impact of regularization bias and overfitting on estimation of the parameter of interest θ_0 can be removed by using two simple, yet critical, ingredients: (1) using Neyman-orthogonal moments/scores that have reduced sensitivity with respect to nuisance parameters to estimate θ_0, and (2) making use of cross-fitting which provides an efficient form of data-splitting. We call the resulting set of methods double or debiased ML (DML). We verify that DML delivers point estimators that concentrate in a N^(-1/2)-neighborhood of the true parameter values and are approximately unbiased and normally distributed, which allows construction of valid confidence statements. The generic statistical theory of DML is elementary and simultaneously relies on only weak theoretical requirements which will admit the use of a broad array of modern ML methods for estimating the nuisance parameters such as random forests, lasso, ridge, deep neural nets, boosted trees, and various hybrids and ensembles of these methods. We illustrate the general theory by applying it to provide theoretical properties of DML applied to learn the main regression parameter in a partially linear regression model, DML applied to learn the coefficient on an endogenous variable in a partially linear instrumental variables model, DML applied to learn the average treatment effect and the average treatment effect on the treated under unconfoundedness, and DML applied to learn the local average treatment effect in an instrumental variables setting. In addition to these theoretical applications, we also illustrate the use of DML in three empirical examples.

We would like to acknowledge research support from the National Science Foundation. We also thank participants of the MIT Stochastics and Statistics seminar, the Kansas Econometrics conference, the Royal Economic Society Annual Conference, The Hannan Lecture at the Australasian Econometric Society meeting, The Econometric Theory lecture at the EC2 meetings 2016 in Toulouse, The CORE 50th Anniversary Conference, The Becker-Friedman Institute Conference on Machine Learning and Economics, The INET conferences at USC on Big Data, the World Congress of Probability and Statistics 2016, the Joint Statistical Meetings 2016, the New England Day of Statistics Conference, CEMMAP's Masterclass on Causal Machine Learning, and St. Gallen's summer school on “Big Data", for many useful comments and questions. We would like to thank Susan Athey, Peter Aronow, Jin Hahn, Guido Imbens, Mark van der Laan, and Matt Taddy for constructive comments. We thank Peter Aronow for pointing us to the literature on targeted learning on which, along with prior works of Neyman, Bickel, and the many other contributions to semiparametric learning theory, we build. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research.
Copy Citation

Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins, "Double/Debiased Machine Learning for Treatment and Structural Parameters," NBER Working Paper 23564 (2017), https://doi.org/10.3386/w23564.

Download Citation

MARC RIS BibTeΧ

Double/Debiased Machine Learning for Treatment and Structural Parameters

Published Versions

Related

Topics

Programs

More from the NBER