. capture set mem 1000m . . /*------------------------------------------------ > This program reads the 1967 NCHS Multiple Cause of Death Data Data File > by Jean Roth Wed Oct 29 16:12:12 EDT 2008 > Please report errors to jroth@nber.org > NOTE: This program is distributed under the GNU GPL. > See end of this file and http://www.gnu.org/licenses/ for details. > Run with do mort1967 > ----------------------------------------------- */ . . /* The following line should contain > the complete path and name of the raw data file. > On a PC, use backslashes in paths as in C:\ */ . . local dat_name "/tmp/mort1967.dat" . . /* The following line should contain the path to your output '.dta' file */ . . local dta_name "mort1967" . . /* The following line should contain the path to the data dictionary file */ . . local dct_name "mort1967.dct" . . /* The following line should contain the complete path and name of the compre > ssed data file. */ . . local compressed "/homes/data/mortality/1959-1967/mort1967.zip" . . ** Removing dat_name . capture rm `dat_name' . . ** Uncompressing the raw data file to /tmp . ** Note that /tmp must have enough space to write the uncompressed file . ! unzip -p `compressed' > "`dat_name'" . . /* The line below does NOT need to be changed */ . . infile using "`dct_name'", using("`dat_name'") clear infile dictionary { *This program reads the 1967 NCHS Multiple Cause of Death Data Data File *by Jean Roth Wed Oct 29 16:12:12 EDT 2008 *Please report errors to jroth@nber.org *See the .do file for directions and run with do mort1967 *NOTE: This program is distributed under the GNU GPL. *See end of this file and http://www.gnu.org/licenses/ for details. ** NOTE: The 1967 data file does not match the PDF documentation; ** This file was created by comparing means and frequencies ** to 1967 VSUS tables and/or the 1966 file; ** So, you may want to check for a surprising result ; ** Report any errors to Jean Roth , jroth@nber.org ; ** ** Columns 3,4,5,6,7,8,61,62,63,69,70 are all blank/do not have data; ** ** The variables that do not seem to be in the 1967 file are ** region and division of residence and occurrence, expanded state ** of residence, city of residence, ucr60, ucr33 ; ** ** The region and division variables can be coded from the state variables ** as can most of expanded state of residence; * ucr60 can be coded from ** from ccr60; * ucr33 can be coded from ucod per the PDF documentation; ** There does not seem to be a way to reconstruct city of residence; ** In 1966, about 25% of records had a value for city of residence; ** ****************************************************; ** ** The only columns in the 1967 file not identified are 16,28,29; ** Column 16 appears to be a geographic variable; * The frequency ** of the value '9' is identical to the frequency of 9 in popsize, ** where 9 means balance of county, rural; ** ** Cumulative Cumulative **b16 Frequency Percent Frequency Percent **-------------------------------------------------------- ** 0 972765 52.54 972765 52.54 ** 1 71540 3.86 1044305 56.41 ** 9 807018 43.59 1851323 100.00 **; ** Cumulative Cumulative **popsize Frequency Percent Frequency Percent **------------------------------------------------------- ** 9 807018 43.59 1851323 100.00 **; ** Columns b28 and b29 appear to be separate one-character variables; ** Perhaps b28 is an indicator variable: 1=living in a (coded) city ; ** Compare to of 1966 city of residence variable, Columns 34-36; ** Or 1968 city of residence variable, Columns 18-20; ** Or, perhaps it is some sort of a delayed shipment indicator like Column 99 i > n 1961 data ; **; ** b28 Frequency Percent Frequency Percent ** --------------------------------------------------------; ** 0 1368754 73.94 1368754 73.94 ** 1 482048 26.04 1850802 99.97 ** 9 473 0.03 1851275 100.00 ; **; ** Column 29 could be some kind of compressed month variable; ** Perhaps 'Principal Month of Occurrence' ? ; ** Compare to Column 98 in 1961 data; ** Cumulative Cumulative ** b29 Frequency Percent Frequency Percent ** --------------------------------------------------------; ** / 1 0.00 1 0.00 ** 0 139092 7.51 139093 7.51 ** 1 300001 16.20 439094 23.72 ; ** 2 301354 16.28 740448 40.00 ** 3 196154 10.60 936602 50.59 ** 4 155586 8.40 1092188 59.00 ; ** 5 151828 8.20 1244016 67.20 ** 6 148892 8.04 1392908 75.24 ** 7 153857 8.31 1546765 83.55 ; ** 8 144846 7.82 1691611 91.37 ** 9 159688 8.63 1851299 100.00 ** S 6 0.00 1851305 100.00 ; ** T 3 0.00 1851308 100.00 ** Y 5 0.00 1851313 100.00 ** Z 4 0.00 1851317 100.00 ; _column(1 ) byte datayear %1f > _column(2 ) str1 reparea %1s > _column(9 ) byte rectype %1f > _column(10 ) byte restatus %1f > _column(11 ) byte stateoc %2f > _column(11 ) str5 countyoc %5s > _column(16 ) byte pop_unknown %1f > _column(17 ) byte exstatoc %2f > _column(19 ) byte popsize %1f > _column(23 ) byte metro %1f > _column(20 ) str3 smsares %3s > _column(24 ) byte monthdth %2f > _column(26 ) byte daydth %2f > _column(28 ) str1 b28 %1s > _column(29 ) str1 b29 %1s > _column(30 ) byte racer2 %1f > _column(31 ) byte racer3 %1f > _column(32 ) byte race %1f > _column(33 ) byte sex %1f > _column(34 ) int age %3f > _column(37 ) byte ager12 %2f > _column(39 ) byte ager27 %2f > _column(41 ) byte ager22 %2f > _column(43 ) str4 ucod %4s > _column(47 ) str1 ucod4 %1s > _column(48 ) int ucr258 %3f > _column(51 ) str1 ucr258s %1s > _column(52 ) byte ucr60 %2f > _column(54 ) byte ccr55 %2f > _column(56 ) byte whoflag %1f > _column(57 ) byte cdcflag %2f > _column(59 ) byte accident %1f > _column(60 ) byte sex60 %1f > _column(64 ) byte staters %2f > _column(64 ) str5 countyrs %5s > } (1851323 observations read) . . . ** Removing the temp file when finished . ! rm -f "`dat_name'" . . note: by Jean Roth, jroth@nber.org Wed Oct 29 16:12:12 EDT 2008 . ** NOTE: The 1967 data file does not match the PDF documentation; . ** This file was created by comparing means and frequencies . ** to 1967 VSUS tables and/or the 1966 file; . ** So, you may want to check for a surprising result ; . ** Report any errors to Jean Roth , jroth@nber.org ; . ** . ** Columns 3,4,5,6,7,8,61,62,63,69,70 are all blank/do not have data; . ** . ** The variables that do not seem to be in the 1967 file are . ** region and division of residence and occurrence, expanded state . ** of residence, city of residence, ucr60, ucr33 ; . ** . ** The region and division variables can be coded from the state variables . ** as can most of expanded state of residence; * ucr60 can be coded from . ** from ccr60; * ucr33 can be coded from ucod per the PDF documentation; . ** There does not seem to be a way to reconstruct city of residence; . ** In 1966, about 25% of records had a value for city of residence; . ** . ****************************************************; . ** . ** The only columns in the 1967 file not identified are 16,28,29; . ** Column 16 appears to be a geographic variable; * The frequency . ** of the value '9' is identical to the frequency of 9 in popsize, . ** where 9 means balance of county, rural; . ** . ** Cumulative Cumulative . **b16 Frequency Percent Frequency Percent . **-------------------------------------------------------- . ** 0 972765 52.54 972765 52.54 . ** 1 71540 3.86 1044305 56.41 . ** 9 807018 43.59 1851323 100.00 . **; . ** Cumulative Cumulative . **popsize Frequency Percent Frequency Percent . **------------------------------------------------------- . ** 9 807018 43.59 1851323 100.00 . **; . ** Columns b28 and b29 appear to be separate one-character variables; . ** Perhaps b28 is an indicator variable: 1=living in a (coded) city ; . ** Compare to of 1966 city of residence variable, Columns 34-36; . ** Or 1968 city of residence variable, Columns 18-20; . ** Or, perhaps it is some sort of a delayed shipment indicator like Column 99 > in 1961 data ; . **; . ** b28 Frequency Percent Frequency Percent . ** --------------------------------------------------------; . ** 0 1368754 73.94 1368754 73.94 . ** 1 482048 26.04 1850802 99.97 . ** 9 473 0.03 1851275 100.00 ; . **; . ** Column 29 could be some kind of compressed month variable; . ** Perhaps 'Principal Month of Occurrence' ? ; . ** Compare to Column 98 in 1961 data; . ** Cumulative Cumulative . ** b29 Frequency Percent Frequency Percent . ** --------------------------------------------------------; . ** / 1 0.00 1 0.00 . ** 0 139092 7.51 139093 7.51 . ** 1 300001 16.20 439094 23.72 ; . ** 2 301354 16.28 740448 40.00 . ** 3 196154 10.60 936602 50.59 . ** 4 155586 8.40 1092188 59.00 ; . ** 5 151828 8.20 1244016 67.20 . ** 6 148892 8.04 1392908 75.24 . ** 7 153857 8.31 1546765 83.55 ; . ** 8 144846 7.82 1691611 91.37 . ** 9 159688 8.63 1851299 100.00 . ** S 6 0.00 1851305 100.00 ; . ** T 3 0.00 1851308 100.00 . ** Y 5 0.00 1851313 100.00 . ** Z 4 0.00 1851317 100.00 ; . . compress . save mort1967,replace (note: file mort1967.dta not found) file mort1967.dta saved . ** Run commands from directory with data . capture erase mort1967.dta.zip . ! zip mort1967.dta.zip mort1967.dta . capture erase mort1967.dta . exit end of do-file