Refining Public Policies with Machine Learning: The Case of Tax Auditing
We study the extent to which ML techniques can be used to improve tax auditing efficiency using administrative data, without the need of randomized audits. Using Italy's population data on sole proprietorship tax returns, audits and their outcome, we develop a new approach to address the so called selective labels problem - the fact that a ML algorithm must necessarily be trained on endogenously selected data. We document the existence of substantial margins for raising revenue from audits by improving the selection of taxpayers to audit with ML. Replacing the 10% least productive audits with an equal number of taxpayers selected by our trained algorithm raises detected tax evasion by as much as 38%, and evasion that is actually payed back by 29%.
We are very grateful to the Italian Revenue Agency for granting us access to the data. We are solely responsible for the ideas expressed in the paper. We thank seminars participants at Cornell University, ETH Zurich and the Italian Presidency of the Council of Ministers for valuable discussions. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research.