Taxsim

Taxsim9 calculates US federal and state income tax liabilities from 22 input variables such as might typically be available in survey data. It covers 1960+ for federal tax and 1977+ for states. The intent is to encourage academic researchers to use after-tax prices and incomes where appropriate in their research.

The R interface

Thanks to Brandon Kaplowitz we now have a nice R interface to Internet Taxsim. The R function taxsim9() takes a matrix or dataframe and returns a dataframe of calculated tax values. As with the SAS, Stata and web versions, the calculation is done on the NBER server and is therefore always up to date. The R function merely packages up the data, uploads (ftp) it to our server, and then fetches the results.

For definitions of variables please see the web interface. There is only one server input format for the http, ftp, Stata and now R. All the taxsim9 clients interact with the same server program with the same file formats. So consult the web interface documentation for information about the taxpayer data requirements. I will omit any details that are specified in the web interface to reduce the need for documentation updates here.

Obtaining the package

Download the package from http://www.nber.org/taxsim/R/ There is a dependency on RCurl, which you may need to satisfy with: install.packages("RCurl") The other dependencies - "bitops" and "XML" have not required installation when tested here.

Install taxsim9.R

In Rstudio you can just install the .zip or .tar.gz file (Tools|Install Packages).

In Linux (without Rstudio) save the package in your home directory, it needn't be decompressed but note that the version number and directory name in the commands below will likely be different for you:

bash cd ~ R CMD INSTALL -l ./R_libs ./taxsim9_0.1.tar.gz export R_LIBS="/home/your_username/R_libs"

CRAN

We will upload to CRAN after the beta period.

Example with matrix input

In this example we prepare 2 taxpayer records with the 22 column variables in the cannonical order, and don't use variable names. library(taxsim9) dt <- c(1,1979,13,3,1,0,30000,0,0,0,0,0,0,0,0,0,2000,0,1,0,-1000,0, 2,2009,13,3,1,0,30000,0,0,0,0,0,0,0,0,0,2000,0,1,0,-1000,0) taxsim9(dt) The taxsim9 function knows to cut the vector every 22 values, but this requires the user to specify all the values, and in the correct order.

Example with named columns

In this example we only specify 3 variables (the rest will default to zero), and again for two taxpayers. library(taxsim9) dt <-c('year','mstat','ltcg',1970,2,100000,1990,2,100000) taxsim9(dt) This returns the tax for a joint return in 1970 with $100,000 in long term capital gains and no other income, deductions or credits. See the web page (above) for descriptions of all the variables.

Note that in taxsim there are no missing values - anything not specified is zero except for law year and marital status. The latter two variables must not be missing and must not have any missing values. Missing value codes will always trigger an error on the taxsim server.

With named columns taxsim9 will zero any omitted variables, and will create the properly ordered datafile for the server. You should run this example and check the returned results for federal income tax (FIT) are 16700.04. If not, something is wrong and should be reported to me.

Here we provide all the columns for a single taxpayer. We provide the names in cannonical order, but that isn't required. Extra variables are allowed and will be ignored.

dt <-c( 'taxsimid', 'year', 'state', 'mstat', 'depx', 'agex', 'pwages', 'swages', 'dividends', 'otherprop', 'pensions', 'gssi', 'transfers', 'rentpaid', 'proptax', 'otheritem', 'childcare', 'ui', 'depchild', 'mortgage', 'ltcg', 'stcg', 1970,0,2,rep(16,0),100000,0) Please consult the Internet Taxsim web page (above) for definitions of the variables.

Example with dataframes

[User Boyd reports that the instructions below are incorrect, and provides corrected instuctions. Here we convert dt to a dataframe and assign the result to another dataframe. library(taxsim9) dt <-c('year','mstat','ltcg',1970,2,100000,1990,2,100000,2016,2,100000) df <-as.data.frame(as.matrix(dt,ncol=22,byrow=TRUE)) result<-taxsim9(df)

Example with options

taxsim9(df,detail=2,mtr=86) This option asks taxsim9 to return additional intermediate values and calculates marginal tax rates with respect to secondary earner's wages.

Credits and Cautions

Brandon Kaplowitz did 100% of the coding, to a design of my suggestion. He will be going back to school soon and this function has not been tested extensively yet, so be careful. I don't know R, but suggestions will be taken to heart and implemented if possible and if you can help.

Very large vectors may be slow, but you should be able to do hundreds of thousands of taxpayers in a few minutes. The time depends more on network latency than anything on our server. However, one user has reported that once he goes above 40,000 or so records, only some calculated records are returned. You should experiment and use some care with large datasets.

If there is problem with the tax calculation itself, there is very good troubleshooting advice on the web page noted above. Once Brandon goes back to school we won't actually have anyone here capable of debugging problems with the R function, however I would be glad to work with you by telephone observing what happens on our server.

Daniel Feenberg
feenberg@nber.org
617-863-0343


last update: 4 May 2017 by drf