Chapter 1 Introduction The Linked Birth/Infant Death Data Set contains three separate data files. The first file includes linked records of live births and infant deaths for the 1988 birth cohort -- also referred to as the numerator file. The second file contains the live birth file for 1988, with a few minor modifications (described below) -- referred to as the denominator-plus file. The files are offered as a numerator/denominator data set to give users the means to compute infant mortality rates. The third file contains information from the death certificate for all infant death records which could not be linked to their corresponding birth certificates -- referred to as the unlinked death file. The 1988 linked file is comprised of deaths to infants born in 1988 who died in 1988 or 1989 before their first birthday. Infant death records were extracted from the 1988 and 1989 National Center for Health Statistics (NCHS) mortality statistical files. Linked birth records were extracted from a denominator file that contained the 1988 NCHS natality statistical file and a small number of late-filed birth certificates. Refer to the Methodology section for a more detailed explanation of records added to the statistical file. The denominator file is not identical with the NCHS natality statistical file. The linked file of live births and infant deaths includes linked records for births and deaths that occurred in the United States to U.S. residents and to U.S. nonresidents. Excluded are deaths that occurred outside the United States to infants born in the U.S.; deaths that occurred in the United States to foreign-born infants; and births and deaths that occurred outside the United States to U.S. residents. Sources for denominator data and for birth records included in the numerator file are described in detail in the 1988 Technical Appendix from the Natality Annual Volume; sources for death records included in the numerator file are described in detail in the 1988 and 1989 Technical Appendices, from the Mortality Annual Volumes. Excerpts of these Technical Appendices are included in this tape documentation. Because of confidentiality concerns, only those counties of 250,000 or more population and only those cities of 250,000 or more population are identified in this data set. The population counts are based on the results of the 1980 census. Users should refer to the geographic code outline in this document for the list of available areas and codes. In tabulations of linked data and denominator data, events occurring in the United States to U.S. nonresidents are included in tabulations that are by place of occurrence, and excluded from tabulations by place of residence. For linked data, these exclusions are based on the usual place of residence item of the mother. This item is contained in both the denominator file and the birth section of the numerator (linked) file. U.S. nonresidents are identified by a code 4 in location 10 of these files. Enhancements to the CD-ROM version intro.doc - Page 1 A number of changes have been made to the CD-ROM version of the 1988 linked birth/infant death data set, compared to the version previously offered on public-use data tape. These include adding selected variables from the numerator file to the denominator file, adding identification numbers for each infant death, and providing a separate file of unlinked infant death records. Selected variables from the numerator file have been added to the denominator file to facilitate processing. These variables are age at death (and recodes), underlying cause of death (and the 61-cause recode), autopsy, and place of accident. These variables are the most widely used variables from the numerator file. With the previous file format, it was sometimes necessary to combine the numerator and denominator files when performing certain multivariate statistical techniques. In fact, NCHS received several calls each year asking how best to combine the numerator and denominator files while eliminating duplicate records. Now, when the number of variables required from the numerator file is limited, the denominator file may be used by itself for ease of programming. It is hoped that this small alteration in file structure will make the linked birth/infant death data set more convenient to use. Because the purpose of this change is to facilitate processing once the data have been exported, these variables are not indexed and are variables for export only. In the denominator-plus file, names for these variables contain an added e at the end of the variable name indicating that these variables are available for export only, and cannot be used for tabulation. To create tables using these variables, please use the numerator file. Infant death identification numbers have been added to both the numerator and the denominator files, so that the same infant can be uniquely identified and matched between the two files. These numbers bear no relationship to birth or death certificate numbers, but are sequential numbers created solely for the purpose of identifying records for the same infant between the numerator and the denominator files. This innovation will enhance processing of the file, as additional data from the numerator file can now be directly matched and imported into the denominator file. Finally, a separate file of infant death records which could not be linked to their corresponding birth records has been added to provide additional information on unlinked records. For the 1988 birth cohort, a total of 1074 infant death records, or 2.8 percent of infant deaths in the U.S., could not be linked to their corresponding birth records. Although the overall percent unlinked is 2.8%, the percent unlinked does vary by certain variables in the file such as race, age at death, and state. See the section on Percent of Records Linked for further information. This unlinked file has been added to provide additional information on unmatched records so that data users who wish to make adjustments to the data (such as weighting) can do so. Documentation Table 6 provides further information on the characteristics of unlinked records. The unlinked record file uses the same tape layout as the numerator file of intro.doc - Page 2 linked birth and infant death records. However, except as noted below, tape locations 1-88, reserved for information from the matching birth certificate, are blank since no matching birth certificate could be found for these records. Both race and sex of child (tape locations 36 and 38, respectively) contain information as reported on the death certificate, rather than the information as reported on the birth certificate as is the case with the linked record file. Also, date of birth as reported on the death certificate is used to generate age at death. This information is used in place of date of birth from the birth certificate, which is not available. Methodology The methodology used to create the national file of linked birth and infant death records takes advantage of two existing data sources: 1. State linked files for the identification of linked birth and infant death certificates; and 2. NCHS natality and mortality computerized statistical files, the source of computer records for the two linked certificates. Virtually all States routinely link infant death certificates to their corresponding birth certificates for legal and statistical purposes. When the birth and death of an infant occur in different States, linking the two records that are filed in different jurisdictions requires State cooperation for the exchange of records. In accordance with the terms of the "Association for Vital Records and Health Statistics Agreement for Administering the Vital Records Exchange System," copies of the records are exchanged by the State of death and State of birth in order to effect a link. In addition, if a third State is identified as the State of residence at the time of birth or death, that State is also sent a copy of the appropriate certificate by the State where the birth or death occurred. The NCHS natality and mortality files, produced annually, include statistical data from birth and death certificates that are provided to NCHS by States under the Vital Statistics Cooperative Program (VSCP). The data have been coded according to uniform coding specifications, have passed rigid quality control standards, have been edited and reviewed, and are the basis for official U.S. birth and death statistics. To initiate processing, NCHS obtained computerized linked files from States that had them and extracted only the birth and death certificate numbers for linked records and State and year of occurrence. The States of Alaska, Arizona, Delaware, Indiana, and Nevada provided linkage information by posting birth certificate numbers on a computer-generated list of infant death certificate numbers that was provided by NCHS. A file that contained only State-provided identifiers for linked certificates was then matched to the NCHS mortality and natality statistical files. Individual birth and death records were selected from their respective files and linked into a single statistical record, thereby establishing a national linked record file. After the initial linkage, NCHS returned to the States of death copies or computer lists of unlinked infant death certificates for followup linking. intro.doc - Page 3 If the birth occurred in a State different from the State of death, the State of birth identified on the death certificate was contacted to obtain the linking birth certificate. If the linking birth certificate from another State had been renumbered, the State of death requested the original certificate number from the State of birth. If the linked birth certificate had been filed after NCHS closed its statistical files, States provided NCHS a copy of the late-filed birth certificate. These certificates were coded, keyed, processed, added to the denominator file and then linked to the infant death record. Approximately 300 late-filed records were added to the denominator. The birth record in the denominator file includes an item in tape location 1 that identifies whether or not the record is linked to an infant death. This item is included in the denominator record for users who would want to identify individual records for which the infant died in the first year of life, or survived. Percentage of Records Linked The 1988 birth cohort linked file includes 37,599 linked records representing 97.2 percent of the infant deaths to the 1988 birth cohort. After followup, records for some 1074 infant deaths, or 2.8 percent of the deaths to the birth cohort, remained unlinked and are not included in the linked file data set. Documentation table 6 presents summary information about the unlinked death records not included in the linked file because they were not linked with their corresponding birth certificates. It is included for users who may want information about the total birth cohort of infant deaths. The table shows counts of unlinked records by race and age at death for each State of residence. The user is cautioned in using table 6 that the race and residence items are based on information reported at the time of death; whereas, tables 2-5 present data from the linked file in which the race and residence items are based on information reported at the time of birth. For more information, see discussions about race and residence on pages 3-4 of the Natality Technical Appendix and about infant deaths on pages 11-12 of the Mortality Technical Appendix in this documentation. While the overall percent linked for infant deaths in the 1988 birth cohort is 97.2%, there are differences in percent linked by certain variables. These differences have important implications for how the data is analyzed. Table 1 shows the percent of infant deaths linked by State of residence. While most States link a high percentage of infant deaths, linkage rates for some States are well below the national average. Note in particular the percent linked for the District of Columbia (88.3%) and for Louisiana (91.2%). When many deaths remained unlinked, infant mortality rates computed for these States are underestimated. Thus, caution must be used in comparing infant mortality rates by State from the linked file. The percent of infant deaths linked by race and age at death is shown in Table 2. The percent linked for black infants is 96.3%, considerably lower than the percent linked for white infants (97.5%). In general, a higher percentage of postneonatal (97.9), than neonatal deaths (96.8%) are linked, intro.doc - Page 4 and the percentage for early neonatal deaths (96.0) is lower still. Again, the lower the percentage linked the more likely that infant mortality rates computed for these groups will be slightly underestimated. Also, since most early neonatal deaths are likely to be very low birthweight infants, and since black infants are more likely to be born at very low birthweights, the patterns in percentage linked provide indirect evidence of lower linkage rates for very low birthweight infants. This hypothesis is supported by relatively low infant mortality rates for infants with birthweights under 500 grams for a few States (data not shown). So, although the data is generally of good quality, the percentage linked should be kept in mind, particularly when investigating infant mortality rates for particular States, race groups, age, or birthweight categories. Table 1. Percent of Infant deaths linked by State of Residence (For linked infant deaths, State of residence is at the time of birth. For unlinked infant deaths, State of residence is at the time of death.) United States 97.2% Montana 97.1% Alabama 99.7% Nebraska 98.6% Alaska 99.2% Nevada 100.0% Arizona 99.0% New Hampshire 99.3% Arkansas 97.7% New Jersey 95.7% California 96.3% New Mexico 98.8% Colorado 99.4% New York 96.5% Connecticut 97.1% Upstate 97.5% Delaware 95.7% City 95.7% District of Columbia 88.3% North Carolina 98.9% Florida 99.6% North Dakota 100.0% Georgia 100.0% Ohio 93.9% Hawaii 98.7% Oklahoma 95.3% Idaho 96.8% Oregon 97.9% Illinois 98.7% Pennsylvania 94.7% Indiana 96.0% Rhode Island 99.2% Iowa 100.0% South Carolina 99.9% Kansas 99.3% South Dakota 100.0% Kentucky 97.5% Tennessee 99.5% Louisiana 91.2% Texas 95.6% Maine 100.0% Utah 99.6% Maryland 91.5% Vermont 98.0% Massachusetts 94.8% Virginia 97.6% Michigan 99.6% Washington 99.2% Minnesota 100.0% West Virginia 98.5% Mississippi 99.8% Wisconsin 98.7% Missouri 98.3% Wyoming 96.9% Imputed Race of Mother Added in 1988 For the 1988 birth cohort, a field for imputed race of mother was added to facilitate tabulations of data by race of mother (position 37). The imputation was performed as follows: if race of mother was not stated, it was assigned as the race of the father, if known, otherwise it was assigned to the specific race group of the mother from the preceding record. In intro.doc - Page 5 1988, race of both parents was missing from only 0.2 percent of birth records. Table 2. Percent of infant deaths linked by race and age at death (Infant deaths are under 1 year. Neonatal deaths are under 28 days; early neonatal, 0-6 days; late neonatal, 7-27 days, and postneonatal, 28 days through 11 months) All races White Black Infant 97.2% 97.5% 96.3% Total Neonatal 96.8% 97.3% 95.7% Early Neonatal 96.6% 97.1% 95.4% Late Neonatal 98.0% 98.0% 97.5% Postneonatal 97.9% 98.0% 98.5% Demographic and Medical Classification The documents listed below describe in detail the procedures employed for demographic classification on both the birth and death records and medical classification on death records. While not absolutely essential to the proper interpretation of the data for a number of general applications, these documents should nevertheless be studied carefully prior to any detailed analysis of demographic or medical (especially multiple cause) data variables. In particular, there are a number of exceptions to the ICD rules in multiple cause-of-death coding which, if not treated properly, may result in faulty analysis of the data. A. Manual of the International Statistical Classification of Diseases, Injuries, and Cause of Death, Ninth Revision (ICD-9) Volumes 1 and 2. B. NCHS Instruction Manual Data Preparation Part 2a, Vital Statistics Instructions for Classifying the Underlying Cause of Death, 1988. C. NCHS Instruction Manual Data Preparation, Part 2b, Vital Statistics Instructions for Classifying Multiple Cause of Death, 1988. D. NCHS Instruction Manual Data Preparation, Part 2c, Vital Statistics ICD-9 ACME Decision Tables for Classifying Underlying Causes of Death, 1988. E. NCHS Instruction Manual Data Preparation, Part 2d, Vital Statistics NCHS Procedures for Mortality Medical Data System File Preparation and Maintenance, Effective 1985. F. NCHS Instruction Manual Data Tabulation, Part 2f, Vital Statistics ICD-9 TRANSAX Disease Reference Tables for Classifying Multiple Causes of Death, 1982-85. G. NCHS Instruction Manual Data Preparation, Part 3a, Vital Statistics Classification and Coding Instructions for Live Birth Records, 1988. intro.doc - Page 6 H. NCHS Instruction Manual Data Preparation, Part 4, Vital Statistics Demographic Classification and Coding Instructions for Death Records, 1988. I. NCHS Instruction Manual Tabulation, Part 11, Vital Statistics Computer Edits for Mortality Data, Effective 1979, Revised 1988. Volumes 1 and 2 of the ICD-9 may be purchased from World Health Organization Publication Center USA, 49 Sheridan Avenue, Albany, New York, 12210. The remaining documents may be requested from the Chief, Data Preparation Branch, Division of Data Processing, National Center for Health Statistics, P.O.Box 12214, Research Triangle Park, North Carolina 27709. In addition, the user should refer to the Technical Appendices of the Vital Statistics of the United States for information on the source of data, coding procedures, quality of the data, etc. Excerpts from the 1988 Natality Technical Appendix and the 1988 and 1989 Mortality Technical Appendices are included in this documentation package. Cause-of-Death Data Mortality data are traditionally analyzed and published in terms of the underlying cause of death. Underlying cause-of-death data are coded and classified as described in the 1988 and 1989 Mortality Technical Appendices. NCHS has augmented underlying cause-of-death data with data on multiple causes reported on the death certificate. The linked file includes both underlying and multiple cause-of-death data. The multiple cause of death codes were developed with two objectives in mind. First, to facilitate etiological studies of the relationships among conditions, it was necessary to reflect accurately in coded form each condition and its location on the certification in the exact manner given by the certifier. Secondly, coding needed to be carried out in a manner by which the underlying cause of death could be assigned through computer applications. The approach was to suspend the linkage provisions of the ICD for the purpose of condition coding and code each entity with minimum regard to other conditions present on the certification. This general approach is hereafter called entity coding. Unfortunately, the set of multiple cause codes produced by entity coding is not conducive to a third objective -- the generation of person-based multiple cause statistics. Person-based analysis requires that each condition be coded within the context of every other condition on the same certificate and modified or linked to such conditions as provided by ICD-9. By definition, the entity data cannot meet this requirement since the linkage provisions distort the character and placement of the information originally recorded by the certifying physician. Since the two objectives are incompatible, NCHS has chosen to create from the original set of entity codes a new code set called record axis multiple cause data. Essentially, the axis of classification has been converted from an intro.doc - Page 7 entity basis to a record (or person) basis. The record axis codes are assigned in terms of the set of codes that best describe the overall medical certification portion of the death certificate. This translation is accomplished by a computer system called TRANSAX (TRANSLATION OF AXIS) through selective use of traditional linkage and modification rules for mortality coding. Underlying cause linkages which simply prefer one code over another for purposes of underlying cause selection are not included. Each entity code on the record is examined and modified or deleted as necessary to create a set of codes which are free of contradictions and are the most precise within the constraints of ICD-9 and medical information on the record. Repetitive codes are deleted. The process may (1) combine two entity axis categories together to a new category thereby eliminating a contradiction or standardizing the data; or (2) eliminate one category in favor of another to promote specificity of the data or resolve contradictions. The following examples from ICD-9 illustrate the effect of this translation: Case 1: When reported on the same record as separate entities, cirrhosis of liver and alcoholism are coded to 5715 (cirrhosis of liver without mention of alcohol) and 303 (alcohol dependence syndrome). Tabulation of records with 5715 would on the surface falsely imply that such records had no mention of alcohol. A preferable codification would be 5712 (alcoholic cirrhosis of liver) in lieu of both 5715 and 303. Case 2: If "gastric ulcer" and "bleeding gastric ulcer" are reported on a record they are coded to 5319 (gastric ulcer, unspecified as acute or chronic, without mention of hemorrhage or perforation) and 5314 (gastric ulcer, chronic or unspecified, with hemorrhage). A more concise codification would be to code 5314 only since the 5314 shows both the gastric ulcer and the bleeding. A. Entity Axis Codes The original conditions coded for selection of the underlying cause-of-death are reformatted and edited prior to creating the public-use tape. The following paragraphs describe the format and application of entity axis data. FORMAT: Each entity-axis code is displayed as an overall seven byte code with subcomponents as follows: 1. Line indicator: The first byte represents the line of the certificate on which the code appears. Six lines (1-6) are allowable with the fourth and fifth denoting one or two written in "due to"s beyond the three lines provided in Part I of the U.S. standard death certificate. Line "6" represents Part II of the certificate. intro.doc - Page 8 2. Position indicator: The next byte indicates the position of the code on the line, i.e., it is the first (1), second (2), third (3),... eighth (8) code on the line. 3. Cause category: The next four bytes represent the ICD-9 cause code. 4. Nature of injury flag: ICD-9 uses the same series of numbers (800-999) to indicate nature of injury (N codes) and external cause codes (E codes). This flag distinguishes between the two with a one (1) representing nature of injury codes and a zero (0) representing all other cause codes. A maximum of 20 of these seven byte codes are captured on a record for multiple cause purposes. This may consist of a maximum of 8 codes on any given line with up to 20 codes distributed across three or more lines depending on where the subject conditions are located on the certificate. Codes may be omitted from one or more lines, e.g., line 1 with one or more codes, line 2 with no codes, line 3 with one or more codes. In writing out these codes, they are ordered as follows: line 1 first code, line 1 second code, etc. ----- line 2 first code, line 2 second code, etc. ----- line 3 ----- line 4 ----- line 5 ----- line 6. Any space remaining in the field is left blank. The specifics of locations are contained in the record layout given later in this document. EDIT: The original conditions are edited to remove invalid codes, reverify the coding of certain rare causes of death, and assure age/cause and sex/cause compatibility. Detailed information relating to the edit criteria and the sets of cause codes which are valid to underlying cause coding and multiple cause coding are provided in Part 11 of the NCHS Vital Statistics Instruction Manual Series. ENTITY AXIS APPLICATIONS: The entity axis multiple cause data is appropriate to analyses which require that each condition be coded as a stand alone entity without linkage to other conditions and/or require information on the placement of such conditions in the certificate. Within this framework, the entity data are appropriate to the examination of etiological relationships among conditions, accuracy of certification reporting, and the validity of traditional assumptions in underlying cause selection. Additionally, the entity data provide in certain categories a more detailed code assignment which is linked out in the creation of record axis data. Where such detail is needed for a study, the user should selectively employ entity data. Finally, the researcher may not wish to be bound by the assumptions used in the axis translation process preferring rather to investigate hypotheses of his own predilection. intro.doc - Page 9 By definition, the main limitation of entity axis data is that an entity code does not necessarily reflect the best code for a condition when considered within the context of the medical certification as a whole. As a result certain entity codes can be misleading or even contradict other codes in the record. For example, category 5750 is titled "Acute cholecystitis without mention of calculus". Within the framework of entity codes this is interpreted to mean that the codable entity itself contained no mention of calculus rather than that calculus was not mentioned anywhere on the record. Tabulation of records with a "5750" as a count of persons having acute cholecystitis without mention of calculus would therefore be erroneous. This illustrates the fact that under entity coding the ICD-9 titles cannot be taken literally. The user must study the rules for entity coding as they relate to his/her research prior to utilization of entity data. The user is further cautioned that the inclusion notes in ICD-9 which relate to modifying and combining categories are seldom applicable to entity coding (except where provided in Part 2b of the Vital Statistics Instruction Manual Series). In tabulating the entity axis data, one may count codes with the resultant tabulation of an individual code representing the number of times the disease(s) represented by the code appears in the file. In this kind of tabulation of morbid condition prevalence, the counts among categories may be added together to produce counts for groups of codes. Alternatively, subject to the limitations given above, one may count persons having mention of the disease represented by a code or codes. In this instance it is not correct to add counts for individual codes to create person counts for groups of codes. Since more than one code in the researcher's interest may appear together on the certificate, totaling must account for higher order interactions among codes. Up to 20 codes may be assigned on a record; therefore, a 20-way interaction is theoretically possible. All totaling must be based on mention of one or more of the categories under investigation. B. Record Axis Codes The following paragraphs describe the format and application of record-axis data. Part 2f of the Vital Statistics Instruction Manual Series describes the TRANSAX process for creating record axis data from entity axis data. FORMAT: Each record (or person) axis code is displayed in five bytes. Location information is not relevant. The code consists of the following components: 1. Cause category: The first four bytes represent the ICD-9 cause code. 2. Nature of injury flag: The last byte contains a 0 or 1 with the 1 indicating that the cause is a nature of injury category. Again, a maximum of 20 codes are captured on a record for multiple cause intro.doc - Page 10 purposes. The codes are written in a 80-byte field in ascending code number (5 bytes) order with any unused bytes left blank. EDIT: The record axis codes are edited for rare causes and age/cause and sex/cause compatibility. Likewise, individual code validity is checked. The valid code set for record axis coding is the same as that for entity coding. RECORD AXIS APPLICATIONS: The record axis multiple cause data set is the basis for NCHS core multiple cause tabulations. Location of codes is not relevant to this data set and conditions have been linked into the most meaningful categories for the certification. The most immediate consequence for the user is that the codes on the record already represent mention of a disease assignable to that particular ICD-9 category. This is in contrast to the entity code which is assigned each time such a disease is reported on two different lines of the certification. Secondly, the linkage implies that within the constraints of ICD-9 the most meaningful code has been assigned. The translation process creates for the user a data set which is edited for contradictions, duplicate codes, and imprecisions. In contrast to entity axis data, record axis data are classified in a manner comparable to underlying cause of death classification thereby facilitating joint analysis of these variables. Likewise, they are comparable to general morbidity coding where the linkage provisions of ICD-9 are usually utilized. A potential disadvantage of record axis data is that some detail is sacrificed in a number of the linkages. The user can take the record axis codes as literally representing the information conveyed in ICD-9 category titles. While knowledge of the rules for combining and linking and coding conditions is useful, it is not a prerequisite to meaningful analysis of the data as long as one is willing to accept the assumptions of the axis translation process. The user is cautioned, however, that due to special rules in mortality coding, not all linkage notes in ICD-9 are utilized. (See Part 2f of the Vital Statistics Instruction Manual Series.) The user should proceed with caution in using record axis data to count conditions as opposed to people with conditions since linkages have been invoked and duplicate codes have been eliminated. As with entity data, person based tabulations which combine individual cause categories must take into account the possible interaction of up to 20 codes on a single certificate. In using the NCHS multiple cause data, the user is urged to review the information in this document and its references. The instructional material does change from year to year and revision to revision. The user is cautioned that coding of specific ICD-9 categories should be checked in the appropriate instruction manual. What may appear on the surface to be the correct code by ICD-9 may in fact not be correct as given in the instruction manuals. If on the surface it is not obvious whether entity axis or record axis data should be employed in a given application, detailed examination of Part 2f of the Vital Statistics Instruction Manual Series and its intro.doc - Page 11 attachments will probably provide the necessary information to make a decision. It allows the user to determine the extent of the trade-offs between the two sets of data in terms of specific categories and the assumptions of axis translation. In certain situations, a combination of entity and record axis data may be the more appropriate alternative. intro.doc - Page 12