------------------------------------------------------------------------------- log: /disk/nber10/SCCS/morg/sources/small_match.log log type: text opened on: 8 Oct 2004, 15:43:02 . set more off . set memory 500m (512000k) . . program define aef2 1. use /homes/data/morg/raw/small/small`2' 2. local latest_year=2003 3. **Keep above this line . . * Person Match . display "Person Match" 4. display "time is $S_TIME" 5. *mym is the year and month of the file where matching observations would b > e . display "Generate mym for minsamp 4" 6. generate int mym = `1' * 10 + 18 if minsamp==4 7. display "Generate mym for minsamp 8" 8. replace mym =`1' * 10 -6 if minsamp==8 9. label var mym "Match year and month-in-sample" 10. display "Generate id" 11. gen id = _n 12. display "Generate Sorting variables" 13. display "Sort vars" 14. local sort mym intmonth state hhid hhnum 15. local sortnf `sort' lineno 16. local sortf `sort' famnum lineno 17. local sort94 `sort' famnum lineno serial 18. if ( `1' <= 1984 ) { 19. local sort `sortnf' 20. } 21. if ( `1' > 1984 & `1' <= 1994 ) { 22. local sort `sortf' 23. } 24. if ( `1' > 1994 ) { 25. local sort `sort94' 26. } 27. display "sort `sort'" 28. sort `sort' id 29. . display "Household/family-level match" 30. ** WARNING: This merge will generate extra observations . ** if obs with duplicate merge variables aren't eliminated . . ** See http://www.stata.com/support/faqs/data/merge.html . local prevyear=`1'-1 31. ** Matching is not possible due to sample redesigns between . ** Jul to Dec 1984 4s to their 1985 8s . ** Jan to Sep 1985 4s to their 1986 8s . ** Jun to Dec 1994 4s to their 1995 8s . ** Jan to Aug 1995 4s to their 1996 8s . display "Match minsamp 8 households to minsamp 4s" 32. **( 79:8s don't have matches ) . if (`1' > 1979 ) { 33. capture describe 34. scalar obs_pre = r(N) 35. *** match files are created using matchYYYY.do . display "Merging `1' 8s to 4s using match`prevyear'4.dta " 36. merge `sort' using /homes/data/morg/match/match`prevyear'4.dta 37. display "Drop unmatched obs from using data" 38. drop if _merge==2 39. capture describe 40. scalar obs_post = r(N) 41. assert obs_pre == obs_post 42. drop _merge 43. } 44. display "Match minsamp 4 households to minsamp 8s " 45. ** No matches yet for most recent year's minsamp 4s . if ( `1' < `latest_year' ) { 46. local nextyear = `1' + 1 47. display "sort `sort' id" 48. sort `sort' id 49. capture describe 50. scalar obs_pre = r(N) 51. display "Merging `1' 4s to 8s using match`nextyear'8.dta " 52. merge `sort' using /homes/data/morg/match/match`nextyear'8.dta 53. display "Drop unmatched obs from the using data" 54. drop if _merge==2 55. capture describe 56. scalar obs_post = r(N) 57. assert obs_pre == obs_post 58. drop _merge 59. } 60. ** Create dummy variables to verify that sex, race, and age match. . ** A value of 1 indicates a match. Zero means no match. . gen byte sexdif = sex == msex 61. gen byte racedif = race == mrace 62. ** Fix race coding scheme for 88:4 and 89:8 . if ( `1'==1988 & minsamp == 4 ) { 63. replace racedif=1 if race==3 & mrace>3 & mrace<. 64. } 65. else if ( `1'==1989 & minsamp == 8 ) { 66. replace racedif=1 if race>3 & race<. & mrace==3 67. } 68. ** In 2003, greatly expanded race categories were used. . ** Over 98% of 2003:8 chose one race category. . tab mrace if `1'==2003 & minsamp==8 & race > 3 & race < . 69. . gen byte age_mage=age-mage 70. gen byte agedif=(age_mage>=-1 & age_mage==3 ) 71. gen byte match=0 72. replace match=1 if sexdif+racedif+agedif==3 73. . ** Make long personid string . tostring intmonth, gen( pid_intmonth ) format( %02.0f ) 74. format hhid %015s 75. tostring lineno, gen( pid_lineno ) format( %02.0f ) 76. ** Race and sex are the same, but the matching age should be included . tostring mage, gen( pid_mage ) format( %02.0f ) 77. replace pid_mage = "99" if mage==. 78. . ** Match variables . ** Create match variables for each time period . ** A '_428' variable is used to match minsamp 4 to minsamp 8 . ** An '_824' variable is used to match minsamp 4 to minsamp 8 . local match_428 year minsamp pid_intmonth state hhid hhnum 79. local match_824 mym pid_intmonth state hhid hhnum 80. local mid79_428 `match_428' pid_lineno sex race age pid_mag > e 81. local mid79_824 `match_824' pid_lineno sex race pid_mage ag > e 82. local mid85_428 `match_428' famnum pid_lineno sex race pid_age pid > _mage 83. local mid85_824 `match_824' famnum pid_lineno sex race pid_mage ag > e 84. local mid89_824 `match_824' famnum pid_lineno sex pid_race pid_mag > e age 85. local mid94_428 `match_428' famnum pid_lineno pid_ser sex race age pid_mag > e 86. local mid94_824 `match_824' famnum pid_lineno pid_ser sex race pid_mage ag > e 87. . ** Pre-1984 matches ( includes 84:8 which matches to 83:4 ) . if (`1' < 1984 | ( `1' == 1984 & minsamp==8 ) ) { 88. local match_428 `match_428' `mid79_428' 89. display "`2' match_428 is " `"`match_428'"' 90. local match_824 `match_824' `mid79_824' 91. display "`2' match_824 is " `"`match_824'"' 92. } 93. if (`1' == 1984 & minsamp==4 ) { 94. local match_428 `mid85_428' 95. } 96. if ((`1'>1984 & `1'<1989)|( `1'==1989 & minsamp==8 )|(`1'>1989 & `1'<=1994 > )) { 97. local match_428 `match_428' `mid85_428' 98. local match_824 `match_824' `mid85_824' 99. } 100. ** 88:4's have a single "3 = Other" race code. Their matches, . ** 89:8's, split into three race codes, American Indian, API, and Other . if (`1' == 1989 & minsamp==8 ) { 101. gen byte pid_race = race 102. replace pid_race=3 if minsamp==8 & race>3 & race<. & mrace==3 103. local match_824 `match_824' `mid79_824' 104. } 105. ** Serial suffix (extra unit id) became available in 1994 for matching . if (`1' > 1994 ) { 106. ** Make uppercase and remove leading and trailing blanks . replace serial = upper(trim( serial )) 107. ** Make numeric version of serial . egen pid_serial = group( serial ) 108. ** A=2 rather than 1 b/c the missing value -1 becomes 1. So, make A=1, > etc. . assert pid_ser == 2 if serial == "A" 109. replace pid_ser = pid_ser - 1 110. local match_428 `match_428' `mid94_428' 111. local match_824 `match_824' `mid94_824` 112. } 113. egen str35 match_428 = concat(`match_428') 114. egen str35 match_824= concat(`match_824') 115. gen matchid = match_428 116. replace matchid = match_824 if minsamp == 8 117. drop pid_* 118. display "time is $S_TIME" 119. . end . . *aef2 1979 79 79_83 . *aef2 1980 80 79_83 . *aef2 1981 81 79_83 . *aef2 1982 82 79_83 . aef2 1983 83 79_83 Person Match time is 15:43:03 Generate mym for minsamp 4 (521 missing values generated) Generate mym for minsamp 8 (521 real changes made) Generate id Generate Sorting variables Sort vars sort mym intmonth state hhid hhnum lineno Household/family-level match Match minsamp 8 households to minsamp 4s Merging 1983 8s to 4s using match19824.dta Drop unmatched obs from using data (174121 observations deleted) Match minsamp 4 households to minsamp 8s sort mym intmonth state hhid hhnum lineno id Merging 1983 4s to 8s using match19848.dta Drop unmatched obs from the using data (172798 observations deleted) no observations (634 missing values generated) (1 real change made) pid_intmonth generated as str2 pid_lineno generated as str2 pid_mage generated as str2 (634 real changes made) 83 match_428 is year minsamp pid_intmonth state hhid hhnum year minsamp pid_int > month state hhid hhnum pid_lineno sex race age pid_mage 83 match_824 is mym pid_intmonth state hhid hhnum mym pid_intmonth state hhid h > hnum pid_lineno sex race pid_mage age (521 real changes made) time is 15:43:28 . *aef2 1984 84 84_88 . *aef2 1985 85 84_88 . *aef2 1986 86 84_88 . *aef2 1987 87 84_88 . *aef2 1988 88 84_88 . *aef2 1989 89 89_93 . *aef2 1990 90 89_93 . *aef2 1991 91 89_93 . *aef2 1992 92 89_93 . *aef2 1993 93 89_93 . *aef2 1994 94 94_97 . *aef2 1995 95 94_97 . *aef2 1996 96 94_97 . *aef2 1997 97 94_97 . *aef2 1998 98 98 . *aef2 1999 99 98 . *aef2 2000 00 98 . *aef2 2001 01 98 . *aef2 2002 02 98 . . end of do-file . des Contains data from /homes/data/morg/raw/small/small83.dta obs: 1,000 vars: 70 8 Oct 2004 13:29 size: 257,000 (99.9% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- minsamp byte %8.0g Month in sample (4 & 8 are departing) intmonth byte %8.0g Interview month hhid str12 %015s Household ID (12 digits) state byte %8.0g State smsarank byte %8.0g SMSA ranking hhnum byte %8.0g Household number activlwr byte %8.0g (r) Major Activity last week hourslw byte %8.0g How many hours last week? reasonlw byte %8.0g Reason <= 35 hours last week absentlw byte %8.0g Why absent from work last week? classer byte %8.0g (e&r) Class of worker ind70 int %8.0g 3-digit industry code (1970) occ80 int %8.0g 3-digit occupation code (1980) lineno byte %8.0g Persons line number in HH relahh byte %8.0g Relationship to household head age byte %8.0g Age marital byte %8.0g Marital status race byte %8.0g Race sex byte %8.0g Sex veteran byte %8.0g Veteran status gradeat byte %8.0g Highest grade attended gradecp byte %8.0g Whether completed highest grade esr byte %8.0g Employment status recode weight float %9.0g Final weight x100 ind80 int %8.0g 3-digit industry code (1980) unioncov byte %8.0g Covered by a Union Contract (1983-) smsastat byte %8.0g SMSA status code centcity byte %8.0g Central city status ethnic byte %8.0g Ethnicity ptstat byte %8.0g Part-time status ftpt79 byte %8.0g Full-time or part-time status doinglw byte %8.0g What was doing most of last week hourslwa byte %8.0g How many hours last week all jobs uhours35 byte %8.0g Usually works >= 35 hrs at this job why35lw byte %8.0g Why not at least 35 hours last week class byte %8.0g Class of worker uhours byte %8.0g (dp) Usual hours paidhr byte %8.0g (dp) Paid by the hour earnhr int %8.0g (dp) Earnings per hour uearnwk int %8.0g (dp) Usual earnings per week earnwt float %9.0g (dp) Earnings weight for all races x100 eligible byte %8.0g Eligibility flag uhourse byte %8.0g (e&dp) Usual hours paidhre byte %8.0g (e&dp) Paid by the hour earnhre int %8.0g (e&dp) Earnings per hour earnwke int %8.0g (e&dp) Earnings per week unionmme byte %8.0g (e) Union member ('83 on) I25a byte %8.0g (dp) Usual hours (I25a) allocation flag I25b byte %8.0g (dp) Paid by hour (I25b) allocation flag I25c byte %8.0g (dp) Earnings/hr (I25c) allocation flag I25d byte %8.0g (dp) Usl Earn/hr (I25d) allocation flag uearnwke int %8.0g (e&dp) Usual weekly earnings year int %8.0g smsa70 byte %8.0g docc80 byte %8.0g dind byte %9.0g mym int %8.0g Match year and month-in-sample id float %9.0g mage byte %8.0g Age mrace byte %8.0g Race msex byte %8.0g Sex famnum byte %8.0g Family number sexdif byte %8.0g racedif byte %8.0g age_mage byte %8.0g agedif byte %8.0g match byte %8.0g match_428 str52 %52s match_824 str52 %52s matchid str52 %52s ------------------------------------------------------------------------------- Sorted by: Note: dataset has changed since last saved . sum match Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- match | 1000 .001 .0316228 0 1 . list match in 1 +-------+ | match | |-------| 1. | 0 | +-------+ . list mat* in 1 +--------------------------------------------------------------+ 1. | match | match_428 | | 0 | 1983801930030012315071198380193003001231507101113433 | |--------------------------------------------------------------| | match_824 | | 1982401930030012315071198240193003001231507101113334 | |--------------------------------------------------------------| | matchid | | 1982401930030012315071198240193003001231507101113334 | +--------------------------------------------------------------+ . list mat* minsamp in 1 +----------------------------------------------------------------+ 1. | match | match_428 | | 0 | 1983801930030012315071198380193003001231507101113433 | |----------------------------------------------------------------| | match_824 | | 1982401930030012315071198240193003001231507101113334 | |----------------------------------------------------------------| | matchid | minsamp | | 1982401930030012315071198240193003001231507101113334 | 8 | +----------------------------------------------------------------+ . sum *age Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- age | 1000 40.563 17.73326 16 94 mage | 366 42.18033 17.40242 16 83 age_mage | 366 .8825137 1.918867 -24 11 . exit,clear