Income tax calculator for SOI-IRS individual Data
TAXCALC is a program available in SAS or Stata for the calculation of US Individual Income Tax liabilities from the confidential micro-data files of the IRS Statistics of Income Division, or from the anonymized public use files made available by that agency. It is intended to facilitate the econometric analysis of tax data, and is not suitable for the preparation of individual tax returns for submission as such.
TAXCALC is unrelated (other than authorship) to the widely used TAXSIM program, also from the NBER. TAXSIM is in Fortran, calculates federal taxes (1960-2023) and state taxes (1977-2011) and operates on a transformation of the SOI Public Use Files (PUF) files from 1960 to the present day, including inflated files out to 2013, On the other hand, this program is written in SAS and Stata native code and operates directly on the proprietary binary files used by those packages. It covers federal liability only (1993 - 2013) but no state tax calculation. Both packages receive annual updates, but TAXCALC is limited to the last year for which data is available for testing.
TAXCALC was written with the following goals in mind:
For most common tax forms the calculation proceeds from the most basic data, and proceeds through all worksheets, clawbacks and restrictions. These include:
We are careful to maintain consistency among the various forms. That is, if a form other than the 1040 asks for AGI, or capital gains, or earned income, we use the value supplied on the 1040, or Schedule D, even if the taxpayer may have supplied, and the SOI accepted a different value. This is essential to the correct calculation of marginal tax rates.
A number of features of the tax code are ignored in the interest of simplification. Only "bottom line" values from some forms are used
When calculating for a law year other than the file year, some data may be unavailable, and no attempt is made to impute such data. For example, in years when a deduction or adjustment is not available, the amount of the expenditure for that item will not be recorded in the data and will be treated as zero in the calculation of tax liability for some other year. While this might bias aggregate revenue unacceptably, the calculator's chief intended use is in econometric studies where cross-year estimates are likely to be used as instrumental variables, and the bias may not be of consequence in such uses.
Documentation for TAXCALC is provided chiefly by a graduated series of example programs. Please study them and read the notes - they are brief, but vital.
VariablesDollar amount variables in the SOI file all begin with an "E" and then a 5-digit number. Our calculated values use a "c" and the same 5 digits. All of these variables have global scope. Non-SOI variables used only in the calculator all begin with an underscore, to avoid confusion with user written code in the same do-file or datastep. A list of calculated variables is available here Variables used in the calculation are listed here.
For copies of the program, and to report bugs or make suggestions, please write or call me. If you write, please include a phone number and a suggested time to call.
All of the SAS calculator code has been written by Inna Shapiro of the NBER. Victoria Bryant of SOI has been a dedicated and patient tester and advisor throughout the process of writing this code. Writing code without access to test data presents real difficulties, and without her help through 98+ turnarounds this effort would not have succeeded. Mike Strudler (SOI), David Joulfaian (OTA) and James Pearce (OTA) have also been very helpful. The Stata version is partially a mechanical translation, and partly handwork by Inna and me. We are very interested in reports of use, and bug reports.