------------------------------------------------------------------------------- log: /disk/nber10/SCCS/morg/sources/match.log log type: text opened on: 12 Oct 2004, 14:13:33 . set more off . set memory 500m (512000k) . . program define aef2 1. use /homes/data/morg/annual/morg`2' 2. local latest_year=2003 3. **Keep above this line . . * Person Match . display "Person Match" 4. display "time is $S_TIME" 5. *mym is the year and month of the file where matching observations would b > e . display "Generate mym for minsamp 4" 6. generate int mym = `1' * 10 + 18 if minsamp==4 7. display "Generate mym for minsamp 8" 8. replace mym =`1' * 10 -6 if minsamp==8 9. label var mym "Match year and month-in-sample" 10. display "Generate id" 11. gen id = _n 12. display "Generate Sorting variables" 13. display "Sort vars" 14. local sort mym intmonth state hhid hhnum 15. local sortnf `sort' lineno 16. local sortf `sort' famnum lineno 17. local sort94 `sort' famnum lineno serial 18. if ( `1' <= 1984 ) { 19. local sort `sortnf' 20. } 21. if ( `1' > 1984 & `1' <= 1994 ) { 22. local sort `sortf' 23. } 24. if ( `1' > 1994 ) { 25. local sort `sort94' 26. } 27. display "sort `sort'" 28. sort `sort' id 29. . display "Household/family-level match" 30. ** WARNING: This merge will generate extra observations . ** if obs with duplicate merge variables aren't eliminated . . ** See http://www.stata.com/support/faqs/data/merge.html . local prevyear=`1'-1 31. ** Matching is not possible due to sample redesigns between . ** Jul to Dec 1984 4s to their 1985 8s . ** Jan to Sep 1985 4s to their 1986 8s . ** Jun to Dec 1994 4s to their 1995 8s . ** Jan to Aug 1995 4s to their 1996 8s . display "Match minsamp 8 households to minsamp 4s" 32. **( 79:8s don't have matches ) . if (`1' > 1979 ) { 33. capture describe 34. scalar obs_pre = r(N) 35. display obs_pre " observations prior to the household match" 36. *** match files are created using matchYYYY.do . display "Merging `1' 8s to 4s using match`prevyear'4.dta " 37. merge `sort' using /homes/data/morg/match/match`prevyear'4.dta 38. display "Drop unmatched obs from using data" 39. drop if _merge==2 40. capture describe 41. scalar obs_post = r(N) 42. display obs_post " observations after household match " 43. sort `sort' id 44. by `sort' id : gen dup4 = _n 45. tab dup4 46. assert obs_pre == obs_post 47. drop _merge 48. } 49. display "Match minsamp 4 households to minsamp 8s " 50. ** No matches yet for most recent year's minsamp 4s . if ( `1' < `latest_year' ) { 51. local nextyear = `1' + 1 52. display "sort `sort' id" 53. sort `sort' id 54. capture describe 55. scalar obs_pre = r(N) 56. display obs_pre " observations prior to the household match" 57. display "Merging `1' 4s to 8s using match`nextyear'8.dta " 58. merge `sort' using /homes/data/morg/match/match`nextyear'8.dta 59. display "Drop unmatched obs from the using data" 60. drop if _merge==2 61. capture describe 62. scalar obs_post = r(N) 63. display obs_post " observations after household match " 64. sort `sort' id 65. by `sort' : gen dup8 = _n 66. tab dup8 67. tab dup4 dup8 68. *assert obs_pre == obs_post . drop _merge 69. } 70. ** Create dummy variables to verify that sex, race, and age match. . ** A value of 1 indicates a match. Zero means no match. . gen byte sexdif = sex == msex 71. gen byte racedif = race == mrace 72. ** Fix race coding scheme for 88:4 and 89:8 . if ( `1'==1988 & minsamp == 4 ) { 73. replace racedif=1 if race==3 & mrace>3 & mrace<. 74. } 75. else if ( `1'==1989 & minsamp == 8 ) { 76. replace racedif=1 if race>3 & race<. & mrace==3 77. } 78. ** In 2003, greatly expanded race categories were used. . ** Over 98% of 2003:8 chose one race category. . tab mrace if `1'==2003 & minsamp==8 & race > 3 & race < . 79. . gen byte age_mage=age-mage 80. gen byte agedif=(age_mage>=-1 & age_mage==3 ) 81. gen byte match=0 82. replace match=1 if sexdif+racedif+agedif==3 83. . ** Make long personid string . tostring intmonth, gen( pid_intmonth ) format( %02.0f ) 84. format hhid %015s 85. tostring lineno, gen( pid_lineno ) format( %02.0f ) 86. ** Race and sex are the same, but the matching age should be included . tostring mage, gen( pid_mage ) format( %02.0f ) 87. replace pid_mage = "99" if mage==. 88. . ** Match variables . ** Create match variables for each time period . ** A '_428' variable is used to match minsamp 4 to minsamp 8 . ** An '_824' variable is used to match minsamp 4 to minsamp 8 . local match_428 year minsamp pid_intmonth state hhid hhnum 89. local match_824 mym pid_intmonth state hhid hhnum 90. local mid79_428 `match_428' pid_lineno sex race age pid_mag > e 91. local mid79_824 `match_824' pid_lineno sex race pid_mage ag > e 92. local mid85_428 `match_428' famnum pid_lineno sex race pid_age pid > _mage 93. local mid85_824 `match_824' famnum pid_lineno sex race pid_mage ag > e 94. local mid89_824 `match_824' famnum pid_lineno sex pid_race pid_mag > e age 95. local mid94_428 `match_428' famnum pid_lineno pid_ser sex race age pid_mag > e 96. local mid94_824 `match_824' famnum pid_lineno pid_ser sex race pid_mage ag > e 97. . ** Pre-1984 matches ( includes 84:8 which matches to 83:4 ) . if (`1' < 1984 | ( `1' == 1984 & minsamp==8 ) ) { 98. local match_428 `match_428' `mid79_428' 99. display "`2' match_428 is " `"`match_428'"' 100. local match_824 `match_824' `mid79_824' 101. display "`2' match_824 is " `"`match_824'"' 102. } 103. if (`1' == 1984 & minsamp==4 ) { 104. local match_428 `mid85_428' 105. } 106. if ((`1'>1984 & `1'<1989)|( `1'==1989 & minsamp==8 )|(`1'>1989 & `1'<=1994 > )) { 107. local match_428 `match_428' `mid85_428' 108. local match_824 `match_824' `mid85_824' 109. } 110. ** 88:4's have a single "3 = Other" race code. Their matches, . ** 89:8's, split into three race codes, American Indian, API, and Other . if (`1' == 1989 & minsamp==8 ) { 111. gen byte pid_race = race 112. replace pid_race=3 if minsamp==8 & race>3 & race<. & mrace==3 113. local match_824 `match_824' `mid79_824' 114. } 115. ** Serial suffix (extra unit id) became available in 1994 for matching . if (`1' > 1994 ) { 116. ** Make uppercase and remove leading and trailing blanks . replace serial = upper(trim( serial )) 117. ** Make numeric version of serial . egen pid_serial = group( serial ) 118. ** A=2 rather than 1 b/c the missing value -1 becomes 1. So, make A=1, > etc. . assert pid_ser == 2 if serial == "A" 119. replace pid_ser = pid_ser - 1 120. local match_428 `match_428' `mid94_428' 121. local match_824 `match_824' `mid94_824` 122. } 123. egen str35 match_428 = concat(`match_428') 124. egen str35 match_824= concat(`match_824') 125. gen matchid = match_428 126. replace matchid = match_824 if minsamp == 8 127. drop pid_* 128. display "time is $S_TIME" 129. . end . . *aef2 1979 79 79_83 . *aef2 1980 80 79_83 . *aef2 1981 81 79_83 . *aef2 1982 82 79_83 . aef2 1983 83 79_83 Person Match time is 14:13:49 Generate mym for minsamp 4 (174042 missing values generated) Generate mym for minsamp 8 (174042 real changes made) Generate id Generate Sorting variables Sort vars sort mym intmonth state hhid hhnum lineno Household/family-level match Match minsamp 8 households to minsamp 4s 348521 observations prior to the household match Merging 1983 8s to 4s using match19824.dta Drop unmatched obs from using data (40293 observations deleted) 348521 observations after household match dup4 | Freq. Percent Cum. ------------+----------------------------------- 1 | 348,521 100.00 100.00 ------------+----------------------------------- Total | 348,521 100.00 Match minsamp 4 households to minsamp 8s sort mym intmonth state hhid hhnum lineno id 348521 observations prior to the household match Merging 1983 4s to 8s using match19848.dta Drop unmatched obs from the using data (41209 observations deleted) 348561 observations after household match dup8 | Freq. Percent Cum. ------------+----------------------------------- 1 | 347,782 99.78 99.78 2 | 778 0.22 100.00 3 | 1 0.00 100.00 ------------+----------------------------------- Total | 348,561 100.00 | dup8 dup4 | 1 2 3 | Total -----------+---------------------------------+---------- 1 | 347,782 778 1 | 348,561 -----------+---------------------------------+---------- Total | 347,782 778 1 | 348,561 no observations (214080 missing values generated) (232 real changes made) pid_intmonth generated as str2 pid_lineno generated as str2 pid_mage generated as str2 (214080 real changes made) 83 match_428 is year minsamp pid_intmonth state hhid hhnum year minsamp pid_int > month state hhid hhnum pid_lineno sex race age pid_mage 83 match_824 is mym pid_intmonth state hhid hhnum mym pid_intmonth state hhid h > hnum pid_lineno sex race pid_mage age (174042 real changes made) time is 14:18:41 . *aef2 1984 84 84_88 . *aef2 1985 85 84_88 . *aef2 1986 86 84_88 . *aef2 1987 87 84_88 . *aef2 1988 88 84_88 . *aef2 1989 89 89_93 . *aef2 1990 90 89_93 . *aef2 1991 91 89_93 . *aef2 1992 92 89_93 . *aef2 1993 93 89_93 . *aef2 1994 94 94_97 . *aef2 1995 95 94_97 . *aef2 1996 96 94_97 . *aef2 1997 97 94_97 . *aef2 1998 98 98 . *aef2 1999 99 98 . *aef2 2000 00 98 . *aef2 2001 01 98 . *aef2 2002 02 98 . . end of do-file . exit,clear