The Accuracy of Tax Imputations: Estimating Tax Liabilities and Credits Using Linked Survey and Administrative Data
This paper calculates accurate estimates of income and payroll taxes using a groundbreaking set of linked survey and administrative tax data that are part of the Comprehensive Income Dataset (CID). We compare our estimates to survey imputations produced by the Census Bureau and those generated using the TAXSIM calculator from the National Bureau of Economic Research. The administrative data include two sets of Internal Revenue Service (IRS) data: (1) a limited set of tax information for the population of individual income tax returns covering selected line items from Forms 1040, W-2, and 1099-R; and (2) an extensive set of population tax records processed by the IRS in 2011, covering nearly every line item on Form 1040 and most lines on a series of third-party information returns. We link these IRS records to the Current Population Survey Annual Social and Economic Supplement (CPS) for reference year 2010. We describe how we form tax units and estimate various types of tax liabilities and credits using these linked data, providing a roadmap for constructing accurate measures of taxes while preserving the survey family as the sharing unit for distributional analyses. We find that aggregate estimates of various tax components using the limited and extensive tax data estimates are close to each other and much closer to public IRS tabulations than either of the imputations using survey data alone. At the individual level, the absolute errors of survey-only imputations of federal income taxes and total taxes are on average 10 percent and 13 percent, respectively, of adjusted gross income. In contrast, the limited tax data imputations yield mean absolute errors for federal income taxes and total taxes that are about 2 percent and 3 percent of adjusted gross income, respectively. For the Earned Income Tax Credit, the limited tax data imputation is off by less than $20 on average for a typical family (compared to more than $500 using either of the survey-only imputations).
Any opinions and conclusions expressed herein are those of the author(s) and do not necessarily represent the views of the U.S. Census Bureau, the Internal Revenue Service, any other agency of the federal government, or the NBER. All results were approved for release by the U.S. Census Bureau, authorization numbers CBDRB-FY20-ERD002-014 and CBDRB-FY20-ERD002-038. We thank Matthew Stadnicki, Brian Curran, Alexa Grunwaldt, and Angela Wyse for excellent research assistance; Katie Genadek for expertly handling lengthy disclosure requests; Jim Davis and Maggie Jones for helping us access TAXSIM on the Census servers; and Dan Feenberg, Tom Hertz, Jonathan Rothbaum, David Splinter,Alex Yuskavage, and participants at the NBER-CRIW and NTA Conferences for helpful comments and discussions. We also appreciate the financial support of the Alfred P. Sloan Foundation, the Russell Sage Foundation, the Charles Koch Foundation, and the Menard Family Foundation. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research.