------------------------------------------------------------------------------- log: /disk/nber10/SCCS/morg/sources/match_one_year.log log type: text opened on: 12 Oct 2004, 15:03:11 . set more off . set memory 500m (512000k) . . program define aef2 1. use /homes/data/morg/annual/morg`2' 2. local latest_year=2003 3. **Keep above this line . . * Person Match . display "Person Match" 4. display "time is $S_TIME" 5. *mym is the year and month of the file where matching observations would b > e . display "Generate mym for minsamp 4" 6. generate int mym = `1' * 10 + 18 if minsamp==4 7. display "Generate mym for minsamp 8" 8. replace mym =`1' * 10 -6 if minsamp==8 9. label var mym "Match year and month-in-sample" 10. display "Generate id" 11. gen id = _n 12. display "Generate Sorting variables" 13. display "Sort vars" 14. local sort mym intmonth state hhid hhnum 15. local sortnf `sort' lineno 16. local sortf `sort' famnum lineno 17. local sort94 `sort' famnum lineno serial 18. if ( `1' <= 1984 ) { 19. local sort `sortnf' 20. } 21. if ( `1' > 1984 & `1' <= 1994 ) { 22. local sort `sortf' 23. } 24. if ( `1' > 1994 ) { 25. local sort `sort94' 26. } 27. display "sort `sort'" 28. sort `sort' id 29. . display "Household/family-level match" 30. ** WARNING: This merge will generate extra observations . ** if obs with duplicate merge variables aren't eliminated . . ** See http://www.stata.com/support/faqs/data/merge.html . local prevyear=`1'-1 31. ** Matching is not possible due to sample redesigns between . ** Jul to Dec 1984 4s to their 1985 8s . ** Jan to Sep 1985 4s to their 1986 8s . ** Jun to Dec 1994 4s to their 1995 8s . ** Jan to Aug 1995 4s to their 1996 8s . display "Match minsamp 8 households to minsamp 4s" 32. **( 79:8s don't have matches ) . if (`1' > 1979 ) { 33. capture describe 34. scalar obs_pre = r(N) 35. display obs_pre " observations prior to the household match" 36. *** match files are created using matchYYYY.do . display "Merging `1' 8s to 4s using match`prevyear'4.dta " 37. by `sort': gen dup = _n 38. tab dup 39. merge `sort' using /homes/data/morg/match/match`1'.dta 40. display "Drop unmatched obs from using data" 41. drop if _merge==2 42. capture describe 43. sort `sort' id 44. by `sort' id : gen dup4 = _n 45. tab dup4 46. drop if dup4 > 1 47. tab dup dup4 48. scalar obs_post = r(N) 49. display obs_post " observations after household match " 50. assert obs_pre == obs_post 51. drop _merge 52. } 53. ** Create dummy variables to verify that sex, race, and age match. . ** A value of 1 indicates a match. Zero means no match. . gen byte sexdif = sex == msex 54. gen byte racedif = race == mrace 55. ** Fix race coding scheme for 88:4 and 89:8 . if ( `1'==1988 & minsamp == 4 ) { 56. replace racedif=1 if race==3 & mrace>3 & mrace<. 57. } 58. else if ( `1'==1989 & minsamp == 8 ) { 59. replace racedif=1 if race>3 & race<. & mrace==3 60. } 61. ** In 2003, greatly expanded race categories were used. . ** Over 98% of 2003:8 chose one race category. . tab mrace if `1'==2003 & minsamp==8 & race > 3 & race < . 62. . gen byte age_mage=age-mage 63. gen byte agedif=(age_mage>=-1 & age_mage==3 ) 64. gen byte match=0 65. replace match=1 if sexdif+racedif+agedif==3 66. . ** Make long personid string . tostring intmonth, gen( pid_intmonth ) format( %02.0f ) 67. format hhid %015s 68. tostring lineno, gen( pid_lineno ) format( %02.0f ) 69. ** Race and sex are the same, but the matching age should be included . tostring mage, gen( pid_mage ) format( %02.0f ) 70. replace pid_mage = "99" if mage==. 71. . ** Match variables . ** Create match variables for each time period . ** A '_428' variable is used to match minsamp 4 to minsamp 8 . ** An '_824' variable is used to match minsamp 4 to minsamp 8 . local match_428 year minsamp pid_intmonth state hhid hhnum 72. local match_824 mym pid_intmonth state hhid hhnum 73. local mid79_428 `match_428' pid_lineno sex race age pid_mag > e 74. local mid79_824 `match_824' pid_lineno sex race pid_mage ag > e 75. local mid85_428 `match_428' famnum pid_lineno sex race pid_age pid > _mage 76. local mid85_824 `match_824' famnum pid_lineno sex race pid_mage ag > e 77. local mid89_824 `match_824' famnum pid_lineno sex pid_race pid_mag > e age 78. local mid94_428 `match_428' famnum pid_lineno pid_ser sex race age pid_mag > e 79. local mid94_824 `match_824' famnum pid_lineno pid_ser sex race pid_mage ag > e 80. . ** Pre-1984 matches ( includes 84:8 which matches to 83:4 ) . if (`1' < 1984 | ( `1' == 1984 & minsamp==8 ) ) { 81. local match_428 `match_428' `mid79_428' 82. display "`2' match_428 is " `"`match_428'"' 83. local match_824 `match_824' `mid79_824' 84. display "`2' match_824 is " `"`match_824'"' 85. } 86. if (`1' == 1984 & minsamp==4 ) { 87. local match_428 `mid85_428' 88. } 89. if ((`1'>1984 & `1'<1989)|( `1'==1989 & minsamp==8 )|(`1'>1989 & `1'<=1994 > )) { 90. local match_428 `match_428' `mid85_428' 91. local match_824 `match_824' `mid85_824' 92. } 93. ** 88:4's have a single "3 = Other" race code. Their matches, . ** 89:8's, split into three race codes, American Indian, API, and Other . if (`1' == 1989 & minsamp==8 ) { 94. gen byte pid_race = race 95. replace pid_race=3 if minsamp==8 & race>3 & race<. & mrace==3 96. local match_824 `match_824' `mid79_824' 97. } 98. ** Serial suffix (extra unit id) became available in 1994 for matching . if (`1' > 1994 ) { 99. ** Make uppercase and remove leading and trailing blanks . replace serial = upper(trim( serial )) 100. ** Make numeric version of serial . egen pid_serial = group( serial ) 101. ** A=2 rather than 1 b/c the missing value -1 becomes 1. So, make A=1, > etc. . assert pid_ser == 2 if serial == "A" 102. replace pid_ser = pid_ser - 1 103. local match_428 `match_428' `mid94_428' 104. local match_824 `match_824' `mid94_824` 105. } 106. egen str35 match_428 = concat(`match_428') 107. egen str35 match_824= concat(`match_824') 108. gen matchid = match_428 109. replace matchid = match_824 if minsamp == 8 110. drop pid_* 111. display "time is $S_TIME" 112. . end . . *aef2 1979 79 79_83 . *aef2 1980 80 79_83 . *aef2 1981 81 79_83 . *aef2 1982 82 79_83 . aef2 1983 83 79_83 Person Match time is 15:03:25 Generate mym for minsamp 4 (174042 missing values generated) Generate mym for minsamp 8 (174042 real changes made) Generate id Generate Sorting variables Sort vars sort mym intmonth state hhid hhnum lineno Household/family-level match Match minsamp 8 households to minsamp 4s 348521 observations prior to the household match Merging 1983 8s to 4s using match19824.dta dup | Freq. Percent Cum. ------------+----------------------------------- 1 | 347,782 99.79 99.79 2 | 738 0.21 100.00 3 | 1 0.00 100.00 ------------+----------------------------------- Total | 348,521 100.00 Drop unmatched obs from using data (81502 observations deleted) dup4 | Freq. Percent Cum. ------------+----------------------------------- 1 | 348,521 99.99 99.99 2 | 40 0.01 100.00 ------------+----------------------------------- Total | 348,561 100.00 (40 observations deleted) | dup4 dup | 1 | Total -----------+-----------+---------- 1 | 347,782 | 347,782 2 | 738 | 738 3 | 1 | 1 -----------+-----------+---------- Total | 348,521 | 348,521 348521 observations after household match no observations (81852 missing values generated) (342 real changes made) pid_intmonth generated as str2 pid_lineno generated as str2 pid_mage generated as str2 (81852 real changes made) 83 match_428 is year minsamp pid_intmonth state hhid hhnum year minsamp pid_int > month state hhid hhnum pid_lineno sex race age pid_mage 83 match_824 is mym pid_intmonth state hhid hhnum mym pid_intmonth state hhid h > hnum pid_lineno sex race pid_mage age (174042 real changes made) time is 15:07:57 . *aef2 1984 84 84_88 . *aef2 1985 85 84_88 . *aef2 1986 86 84_88 . *aef2 1987 87 84_88 . *aef2 1988 88 84_88 . *aef2 1989 89 89_93 . *aef2 1990 90 89_93 . *aef2 1991 91 89_93 . *aef2 1992 92 89_93 . *aef2 1993 93 89_93 . *aef2 1994 94 94_97 . *aef2 1995 95 94_97 . *aef2 1996 96 94_97 . *aef2 1997 97 94_97 . *aef2 1998 98 98 . *aef2 1999 99 98 . *aef2 2000 00 98 . *aef2 2001 01 98 . *aef2 2002 02 98 . . end of do-file . exit,clear