Taxsim
Taxsim9 calculates US federal and state income tax
liabilities from 22 input variables such as might
typically be available in survey data. It covers 1960+
for federal tax and 1977+ for states. The intent is to
encourage academic researchers to use after-tax prices
and incomes where appropriate in their research.
The R interface
Thanks to Brandon Kaplowitz we now have a nice R
interface to Internet Taxsim. The R function taxsim9()
takes a matrix or dataframe and returns a dataframe of
calculated tax values. As with the SAS, Stata and web
versions, the calculation is done on the NBER server and
is therefore always up to date. The R function merely
packages up the data, uploads (ftp) it to our server,
and then fetches the results.
For definitions of variables please see the web
interface. There is only one server input format for
the http, ftp, Stata and now R. All the taxsim9 clients
interact with the same server program with the same file
formats. So consult the web interface documentation for
information about the taxpayer data requirements. I will
omit any details that are specified in the web interface
to reduce the need for documentation updates here.
Obtaining the package
Download the package from
http://www.nber.org/taxsim/R/ There is a dependency
on RCurl, which you may need to satisfy with:
install.packages("RCurl")
The other dependencies - "bitops" and "XML" have not
required installation when tested here.
Install taxsim9.R
In Rstudio you can just install the .zip or .tar.gz file
(Tools|Install Packages).
In Linux (without Rstudio) save the package in your
home directory, it needn't be decompressed but note that
the version number and directory name in the commands
below will likely be different for you:
bash
cd ~
R CMD INSTALL -l ./R_libs ./taxsim9_0.1.tar.gz
export R_LIBS="/home/your_username/R_libs"
CRAN
We will upload to CRAN after the beta period.
Example with matrix input
In this example we prepare 2 taxpayer records with the
22 column variables in the cannonical
order, and don't use variable names.
library(taxsim9)
dt <- c(1,1979,13,3,1,0,30000,0,0,0,0,0,0,0,0,0,2000,0,1,0,-1000,0,
2,2009,13,3,1,0,30000,0,0,0,0,0,0,0,0,0,2000,0,1,0,-1000,0)
taxsim9(dt)
The taxsim9 function knows to cut the vector every 22
values, but this requires the user to specify all the
values, and in the correct order.
Example with named columns
In this example we only specify 3 variables (the rest
will default to zero), and again for two taxpayers.
library(taxsim9)
dt <-c('year','mstat','ltcg',1970,2,100000,1990,2,100000)
taxsim9(dt)
This returns the tax for a joint return in 1970 with
$100,000 in long term capital gains and no other income,
deductions or credits. See the web page (above) for
descriptions of all the variables.
Note that in taxsim there are no missing values -
anything not specified is zero except for law year and
marital status. The latter two variables must not be
missing and must not have any missing values. Missing
value codes will always trigger an error on the taxsim
server.
With named columns taxsim9 will zero any omitted
variables, and will create the properly ordered datafile
for the server. You should run this example and check
the returned results for federal income tax (FIT) are
16700.04. If not, something is wrong and should be
reported to me.
Here we provide all the columns for a single
taxpayer. We provide the names in cannonical order, but
that isn't required. Extra variables are allowed and
will be ignored.
dt <-c(
'taxsimid', 'year', 'state', 'mstat', 'depx', 'agex',
'pwages', 'swages', 'dividends', 'otherprop', 'pensions',
'gssi', 'transfers', 'rentpaid', 'proptax', 'otheritem',
'childcare', 'ui', 'depchild', 'mortgage', 'ltcg',
'stcg',
1970,0,2,rep(16,0),100000,0)
Please consult the Internet Taxsim web page (above) for
definitions of the variables.
Example with dataframes
[User Boyd reports that the instructions below are incorrect, and provides
corrected instuctions.
Here we convert dt to a dataframe and assign the result to another dataframe.
library(taxsim9)
dt <-c('year','mstat','ltcg',1970,2,100000,1990,2,100000,2016,2,100000)
df <-as.data.frame(as.matrix(dt,ncol=22,byrow=TRUE))
result<-taxsim9(df)
Example with options
taxsim9(df,detail=2,mtr=86)
This option asks taxsim9 to return additional
intermediate values and calculates marginal tax rates
with respect to secondary earner's wages.
Credits and Cautions
Brandon Kaplowitz did 100% of the coding, to a design of
my suggestion. He will be going back to school soon and
this function has not been tested extensively yet, so be
careful. I don't know R, but suggestions will be taken
to heart and implemented if possible and if you can
help.
Very large vectors may be slow, but you should be
able to do hundreds of thousands of taxpayers in a few
minutes. The time depends more on network latency than
anything on our server. However, one user has reported
that once he goes above 40,000 or so records, only some
calculated records are returned. You should experiment
and use some care with large datasets.
If there is problem with the tax calculation itself,
there is very good troubleshooting advice on the web
page noted above. Once Brandon goes back to school we
won't actually have anyone here capable of debugging
problems with the R function, however I would be glad to
work with you by telephone observing what happens on our
server.
Daniel Feenberg
feenberg@nber.org
617-863-0343
last update: 4 May 2017 by drf