1. Requirements
The programs have been run on both linux and windows, running a Stata-mp or Stata-se. "Big" Stata is used because some of the raw data sets come in a single file. Its suggested that you read the README included with the pack before continuing.
2. Download the harmonization Stata scripts and raw data
A user needs to apply and download the raw survey data from the granting institution, the NBER only provides Stata scripts for researchers to use.
Note that users have the choice of downloading scripts which have the minimum number of variables harmonized but requires downloading several more data sets (outline in the README) or the full harmonization list which is a little more involved to set up.
For the purpose of this website, lets suppose that you are interested in setting up the full list of harmonized variables. You have gone to the website and downloaded the compressed files and unzipped them into the following directory. Note that I am using a unix directory structure here, if working in windows the directory structure will be different (C:\\ )
~/pvwsurvey
If your working on the NBER servers, you can use git to clone the most recent version of the data by typing in the following command from the terminal. If you are interested in downloading the full lust of the variables then type in the following where 'devagingscript' refers to the full list of vars.
nber> git clone /homes/nber/daltonm/pvwrepo/dev_aging_script ~/your/directory/here
If you want the subset of variables found in the PVW index only then use the following repo and git code to copy the code:
nber> git clone /homes/nber/daltonm/pvwrepo/AgingSurveyScript ~/your/directory/here
The git clone command can be thought of as a command which copies all of the files to the directory you specify in the last part of code. The added benefit of using git is that you get the repository of changes as well so its easier to track any changes made to the code.
3. Define directories for program
Once the user has downloaded the scripts and the raw data the only other file a user should edit is the definedirectory.do file found here
~/pvwsurvey |--- define_directory.do
It is a Stata file which all other Stata files read and defines the location of the raw files and output files. You can think of it as a centralized location of all of the directories ,hence you only need to edit this file.
The first thing to define is the location of where the definedirectory.do file is found and to place it in the global macro found in line 4.
1: ///////////////////////////////////////////////////////////////////// 2: ///LOCATION OF THIS FILE - 'define_directories.do' - //////////////// 3: //////////////////////////////////////////////////////////////////// 4: global locationofprogms /homes/nber/daltonm/bulk/Output/dev_aging_script
the following examples are takend from the definedirectory.do file, where some lines have been excluded to condense the section.
3.1 Raw HRS directories
The next section deals with defining where the raw variables are found. In part because some decompression utilities place files in a folder while others do not, I felt it would be easier if each raw data set gets its own global macro. The first macro 6 serves as a root directory where most of the HRS variables will be found. For the most part users will not need to change the raw dataset names, unless they are updated before the programs are updated or in the case of using an earlier version of the data.
The the second block of directories starting on line 8 and ending on line 12 , corresponds to the raw HRS files with a .da file type. Note that updated dictionaries have been included with the programs which make it easier to load in the files. Users should make sure they are using the same version of these files as I have indicated in the README , since the dictionaries will correspond to the files used by the program. Note that this second block is only needed for users looking to make use of the full harmonization.
The rest of the HRS directories link directly to the stata data file name after defining a root as can be seen in lines 15 and 16.
1: ///#############################################################
2: //// HRS ::: 3 FILE TYPE (1) RAw (2) RAND FAT (3) RAND HARMONIZD
3: ///#############################################################
4: /// (1) RAW FILES - IF USING
5: /// -- directory where preprocessed file be saved
6: global hrs /homes/nber/daltonm/bulk/AgeSurDTA/HRS/raw/
7:
8: global hrs2004LB $hrs/2004/H04LB_R.da
9: ...
10: ... (excluded for brevity)
11: ...
12: global hrstracker $hrs/tracker/TRK2010TR_R.da
13:
14: /// (2) RAW FAT FILES
15: global hrsfat /homes/nber/daltonm/bulk/AgeSurDTA/HRS/HRSFATRAND/
16: global hrsfat92 $hrsfat/h92f1b.dta
17: ...
18: .... (excluded for brevity)
19: ...
20: global hrsfat10 $hrsfat/h10f3a.dta
21:
2: /// (3) RAND HARMONIZED FILES - THIS LINKS DIRECTLY TO THE HRSRAND HARMONIZATION FILE
23: global hrsrand /homes/nber/daltonm/bulk/AgeSurDTA/HRS/HRSRAND/statase/rndhrs_l.dta
3.2 Raw SHARE directories
The next block of directories corresponds to the SHARE files and directories. The following block of directories follows that of the HRS in that a share root is asked, see line 5. However SHARE is different then the other files in that some files are decompressed into a folder while other are not. To account for this the and to avoid writing out all files only the stub name is given with the section or .dta ending. For example looking at line 7, that would correspond to location where the Stata data files are written '$share/statasharew4rel1-1-1allcapimodules/' plus the stub name 'sharew4rel1-1-1_'. This line would load in for example 'sharew4rel1-1-1ac.dta' found in the folder '$share/statasharew4rel1-1-1allcapimodules/' . Again user should not need to change much but the SHARE stub name and maybe the second folder depending on what their decompression software does.
Note that other global macros are linked directly to a single Stata data file, for example see 11 and 12. This is generally denoted by the global macro ending with a '.dta'. Also note that the 2008, 2006 and 2004 blocks are not shown but are the same, in turns of setup as the 2010 block.
1: ///##################################################################
2: //// SHARE ::: RAw files not in HRS RAND ////////////////////////////
3: ///##################################################################
4: /// stub directory used below of where the following files are used./
5: global share /homes/nber/daltonm/bulk/AgeSurDTA/SHARE////////////////
6: /////////// 2010 ////////////////////////////////////////////////////
7: global share2010_capi $share/stata_sharew4_rel1-1-1_all_capi_modules/sharew4_rel1-1-1_ //
8: /// stata_sharew4_rel1-1-1_gv_biomarker.dta (do not include gv_biomarker.dta here)
9: global share2010_gen $share/stata_sharew4_rel1-1-1_all_generated_modules/sharew4_rel1-1-1_
10: /// DIRECT LINK TO SINGLE FILE
11: global share2010_dropoff $share/sharew4_rel1-1-1_dropoff.dta
12: global share2010_cv_r $share/sharew4_rel1-1-1_cv_r.dta
The following block corresponds to the ELSA code and is not as complicated to set up. For the most part the ELSA gets decompressed into a single folder. There are few enough files that each file gets a global variable representation in lines 6 through 15. A new user should only need to change the ELSA root global variable found in line 4. Note that the files found in 18 through 23 are not used.
1: ///########################################################
2: //// ELSA :::
3: ///########################################################
4: global elsa /homes/nber/daltonm/bulk/AgeSurDTA/ELSA/UKDA-5050-stata11_se/stata11_se
5: 6: global elsaindx $elsa/index_file_wave_0-wave_5_v2.dta
7: global elsarand $elsa/rand_elsa-hrs_harmonised.dta
8: ...
9: .... (excluded for brevity)
10: ...
11: ///elsa wave 5
12: global elsa10core $elsa/wave_5_elsa_data_v3.dta
13: global elsa10fin $elsa/wave_5_financial_derived_variables.dta
14: global elsa10ifin $elsa/wave_5_ifs_derived_variables.dta
15: global elsa10pen $elsa/wave_5_pension_wealth.dta
16: 17: ///elsa files currently not used
18: ///wave_1_pension_grid.dta
19: ///wave_2_pension_grid_v3.dta
20: ///wave_3_mortgage_grid.dta
21: ///wave_3_pension_grid_v3.dta
22: ///wave_4_pension_grid_v1.dta
23: ///wave_5_pension_grid_v2.dta
3.3 Raw JSTAR directories
The JSTAR setup is similar to the ELSA in that the root location is defined by by a global macro 4 which in turn is used to define the raw data set names. Finally JSTAR includes some labels and users can point to those do files using the global macro found in line 13.
1: ///########################################################
2: //// JSTAR :::
3: ///########################################################
4: global jstar /homes/nber/daltonm/bulk/AgeSurDTA/jstar.work/20140915
5: /// following global directories link directly to the file
6: global jstar2007 $jstar/2007_5city_public.dta
7: global jstar2009_5city $jstar/2009_5city_public.dta
8: global jstar2009_2city $jstar/2009_2city_public.dta
9: global jstar2011_3city $jstar/2011_7city_public.dta
10: global jstar2011_7city $jstar/2011_3city_public.dta
11: 12: /// place the directory for the labels which come with jstar data
13: global jstarlabel /homes/nber/daltonm/bulk/AgeSurDTA/jstar.work/20140915
3.4 Final output directories
One of the last blocks defined where the preprocessed raw files should be placed in order to avoid over writing or even writing to the same location as the raw files which can result in confusion down the road. This is the location where the final Step2 harmonization scripts will read from and the Step1 scripts will write to.
1: ///////////////////////////////////////////////////////////////////////
2: // LOCATION WHERE PRE PROCCESSED RAW FILES WILL BE PLACED (AFTER STEP1 PROCESSING)
3: ///////////////////////////////////////////////////////////////////////
4: global hrs_rawout /homes/nber/daltonm/bulk/AgeSurDTA/HRS/raw/4harm
5: global share_rawout /homes/nber/daltonm/bulk/AgeSurDTA/SHARE/4harm
6: global jstar_rawout /homes/nber/daltonm/bulk/AgeSurDTA/jstar.work/4harm
7: global elsa_rawout /homes/nber/daltonm/bulk/AgeSurDTA/ELSA/4harm
8: 9: ///generates directories
10: cap mkdir "$hrs_rawout"
11: cap mkdir "$share_rawout"
12: cap mkdir "$jstar_rawout"
13: cap mkdir "$elsa_rawout"
The last directory to define is where the final harmonization script should be placed.
1: //################################################################## 2: //## USER OTHER DIRECTORIES DEFINED RECURSIVELY - NO NEED TO CHANGE 3: //################################################################### 4: ///THIS DIRECTORY SHOULD MATCH THAT USED IN generatemasterLBfile.do; 5: global outfiles /homes/nber/daltonm/bulk/Output/FinalUserDocs/data/
Author: Maurice Dalton