Guidelines for non-NBER users of -taxpuf9.ado

These guidelines are for researchers sending me Stata .do files to run against the SOI PUF files kept here at NBER. I'd like to keep my role quick and mechanical so that turnaround for users is minimal.

I will send each such user a .zip file with programs and test data. Once you are satisfied with the tests, send me the .do file and I will run it against the full sample, including more recent files and return the results to you. Turnaround should be a day or less, but no promises.

Data

The taxpuf9.zip file includes a 2% subsample of the SOI micro sample for the years 1960 through 1991. The full samples for these years are available from the National Archives in SOI format. Later years are available only from SOI, and we are not allowed to redistribute them.

The original SOI files can not be used as input for taxsim, which requires more uniform files through time.

SOI files did not name the variables, and data elements were inconsistent through time. The NBER has created .dta files with a highly consistent naming (actually numbering) convention through time for a subset of the original variables. Wages are always "data11". Full long term gains are calculated by dividing the SOI supplied amount by 1., .5 or .4 and stored as "data70". Similar calculations are done for various items subject to a floor or ceiling. A list of the variable names is contained in the file "taxpuf9-variables.txt". From within Stata Install taxpuf9 with:

net from "http://www.nber.org/stata" net install taxpuf9 net get taxpuf9

Submissions

Your program should be named "taxsimN.do", where N is an integer that increments with each submission. For example "taxsim1.do" would be your first attempt.

Your program should start with a log command:

log using taxsimN,text replace This will help me keep track of what is happening. I will be keeping the do files, but not the output. The "text replace" options are important to me.

Programs may read only from /home/data/soi/taxsim/dta. Use a macro to specify the filename so that changing to the NBER system requires minimal edits:

local prefix D:\Data\US_macro\taxsim if c(username)=="feenberg" then local prefix /home/data/soi/taxsim/dta/s ... use `prefix'2008 With this specification The directory for SOI data will be specified correctly, and without my editing the program. Also, it is simple for me to change from the subsets to the full dataset.

The included test files are named "s1960.dta" through "s1991.dta", but there are not files for 1961 or 1963. At the NBER the full files have the same name, but with an "x" instead of an "s".

You can read multiple years in a loop: use `prefix'1960' append `prefix`1962' forvalues i=1965/1991 { append using `prefix`i' } data103 always gives the file year. This will not overload the memory at NBER.

Programs may write only to the current directory or /tmp (Stata temp files are ok).

If you want .dta files returned, please place a -summarize- command after the -save- command so that I can look in the log to see what is happening. Also, include a command to zip up the files you want:

! zip taxsim1.zip file1.dta file2.dta Before sending me a program look it over carefully for signs of code that is specific to your system, such as directory names, system commands, etc.

This is a new service, so expect changes in the guidelines as I gain experience. Daniel Feenberg 617-863-0343