Chapter 1 Introduction The Linked Birth/Infant Death Data Set contains three separate data files. The first file includes linked records of live births and infant deaths for the 1986 birth cohort -- also referred to as the numerator file. The second file contains the live birth file for 1986, with a few minor modifications (described below) -- referred to as the denominator-plus file. The files are offered as a numerator/denominator data set to give users the means to compute infant mortality rates. The third file contains information from the death certificate for all infant death records which could not be linked to their corresponding birth certificates -- referred to as the unlinked death file. The 1986 linked file is comprised of deaths to infants born in 1986 who died in 1986 or 1987 before their first birthday. Infant death records were extracted from the 1986 and 1987 National Center for Health Statistics (NCHS) mortality statistical files. Linked birth records were extracted from a denominator file that contained the 1986 NCHS natality statistical file and a small number of late-filed birth certificates. Refer to the Methodology section for a more detailed explanation of records added to the statistical file. The denominator file is not identical with the NCHS natality statistical file. The linked file of live births and infant deaths includes linked records for births and deaths that occurred in the United States to U.S. residents and to U.S. nonresidents. Excluded are deaths that occurred outside the United States to infants born in the U.S.; deaths that occurred in the United States to foreign-born infants; and births and deaths that occurred outside the United States to U.S. residents. Sources for denominator data and for birth records included in the numerator file are described in detail in the 1986 Technical Appendix from the Natality Annual Volume; sources for death records included in the numerator file are described in detail in the 1986 and 1987 Technical Appendices, from the Mortality Annual Volumes. Excerpts of these Technical Appendices are included in this CD-ROM documentation. Because of confidentiality concerns, only those counties of 250,000 or more population and only those cities of 250,000 or more population are identified in this data set. The population counts are based on the results of the 1980 census. Users should refer to the geographic code outline in this document for the list of available areas and codes. In tabulations of linked data and denominator data, events occurring in the United States to U.S. nonresidents are included in tabulations that are by place of occurrence, and excluded from tabulations by place of residence. For linked data, these exclusions are based on the usual place of residence item of the mother. This item is contained in both the denominator file and the birth section of the numerator (linked) file. U.S. nonresidents are identified by a code 4 in location 10 of these files. intro.doc - Page 1 Enhancements to the CD-ROM version A number of changes have been made to the CD-ROM version of the 1986 linked birth/infant death data set, compared to the version previously offered on public-use data tape. These include adding selected variables from the numerator file to the denominator file, adding identification numbers for each infant death, and providing a separate file of unlinked infant death records. Selected variables from the numerator file have been added to the denominator file to facilitate processing. These variables are age at death (and recodes), underlying cause of death (and the 61-cause recode), autopsy, and place of accident. These variables are the most widely used variables from the numerator file. With the previous file format, it was sometimes necessary to combine the numerator and denominator files when performing certain multivariate statistical techniques. In fact, NCHS received several calls each year asking how best to combine the numerator and denominator files while eliminating duplicate records. Now, when the number of variables required from the numerator file is limited, the denominator file may be used by itself for ease of programming. It is hoped that this small alteration in file structure will make the linked birth/infant death data set more convenient to use. Because the purpose of this change is to facilitate processing once the data have been exported, these variables are not indexed and are variables for export only. In the denominator-plus file, names for these variables contain an added e at the end of the variable name indicating that these variables are available for export only, and cannot be used for tabulation. To create tables using these variables, please use the numerator file. Infant death identification numbers have been added to both the numerator and the denominator files, so that the same infant can be uniquely identified and matched between the two files. These numbers bear no relationship to birth or death certificate numbers, but are sequential numbers created solely for the purpose of identifying records for the same infant between the numerator and the denominator files. This innovation will enhance processing of the file, as additional data from the numerator file can now be directly matched and imported into the denominator file. Finally, a separate file of infant death records which could not be linked to their corresponding birth records has been added to provide additional information on unlinked records. For the 1986 birth cohort, a total of 779 infant death records, or 2.0 percent of infant deaths in the U.S., could not be linked to their corresponding birth records. However, unlinked records are not distributed evenly among the States. Altogether, 3/4 of the 779 unlinked records excluded from the 1986 birth cohort are from just 8 States (Table 1). Some of these represent large States with relatively good match rates (California, New York). The percent unlinked is substantially higher in other States. This variation in match rates by State leads to differential underestimation of State-specific infant mortality rates in the linked data set, when these rates are compared to those derived from the annual mortality data file intro.doc - Page 2 (Table 2). Thus, a separate file containing the unlinked infant death records has been added to provide additional information on unmatched records so that data users who wish to make adjustments to the data (such as weighting) can do so. Documentation Table 6 provides further information on the characteristics of unlinked records. The unlinked record file uses the same tape layout as the numerator file of linked birth and infant death records. However, except as noted below, tape locations 1-88, reserved for information from the matching birth certificate, are blank since no matching birth certificate could be found for these records. Both race and sex of child (tape locations 36 and 38, respectively) contain information as reported on the death certificate, rather than the information as reported on the birth certificate as is the case with the linked record file. Also, date of birth as reported on the death certificate is used to generate age at death. This information is used in place of date of birth from the birth certificate, which is not available. Table 1. Number and percent of unlinked infant death records, by state of residence at death, United States and selected States, 1986 birth cohort Area of infant's death Unlinked Infant Deaths percent of Number total events in each area United States 779 2.0 Texas 130 4.5 California 118 2.8 Maryland 75 9.2 Louisiana 61 6.6 New York 57 2.0 New Jersey 52 4.9 Virginia 46 4.8 Ohio 45 2.7 Table 2. Infant mortality rates from the linked data set and from the annual files, and the ratio of these rates, by residence, United States and selected States, 1986 Infant mortality rate Linked Annual Ratio Data Set mortality file Col. 2/Col. 3 United States 10.1 10.4 .97 Texas 8.9 9.5 .94 California 8.7 8.9 .98 Maryland 10.6 11.7 .91 Louisiana 11.1 11.9 .93 New York 10.4 10.7 .97 New Jersey 9.4 9.8 .96 Virginia 10.5 11.1 .95 Ohio 10.3 10.6 .97 intro.doc - Page 3 Methodology The methodology used to create the national file of linked birth and infant death records takes advantage of two existing data sources: 1. State linked files for the identification of linked birth and infant death certificates; and 2. NCHS natality and mortality computerized statistical files, the source of computer records for the two linked certificates. Virtually all States routinely link infant death certificates to their corresponding birth certificates for legal and statistical purposes. When the birth and death of an infant occur in different States, linking the two records that are filed in different jurisdictions requires State cooperation for the exchange of records. In accordance with the terms of the "Association for Vital Records and Health Statistics Agreement for Administering the Vital Records Exchange System," copies of the records are exchanged by the State of death and State of birth in order to effect a link. In addition, if a third State is identified as the State of residence at the time of birth or death, that State is also sent a copy of the appropriate certificate by the State where the birth or death occurred. The NCHS natality and mortality files, produced annually, include statistical data from birth and death certificates that are provided to NCHS by States under the Vital Statistics Cooperative Program (VSCP). The data have been coded according to uniform coding specifications, have passed rigid quality control standards, have been edited and reviewed, and are the basis for official U.S. birth and death statistics. To initiate processing, NCHS obtained computerized linked files from States that had them and extracted only the birth and death certificate numbers for linked records and State and year of occurrence. The States of Alaska, Arizona, Delaware, Indiana, and Nevada provided linkage information by posting birth certificate numbers on a computer-generated list of infant death certificate numbers that was provided by NCHS. A file that contained only State-provided identifiers for linked certificates was then matched to the NCHS mortality and natality statistical files. Individual birth and death records were selected from their respective files and linked into a single statistical record, thereby establishing a national linked record file. After the initial linkage, NCHS returned to the States of death copies or computer lists of unlinked infant death certificates for followup linking. If the birth occurred in a State different from the State of death, the State of birth identified on the death certificate was contacted to obtain the linking birth certificate. If the linking birth certificate from another State had been renumbered, the State of death requested the original certificate number from the State of birth. If the linked birth certificate had been filed after NCHS closed its statistical files, States provided NCHS a copy of the late-filed birth certificate. These certificates were coded, keyed, processed, added to the intro.doc - Page 4 denominator file and then linked to the infant death record. Approximately 300 late-filed records were added to the denominator. The birth record in the denominator file includes an item in tape location 1 that identifies whether or not the record is linked to an infant death. This item is included in the denominator record for users who would want to identify individual records for which the infant died in the first year of life, or survived. The 1986 birth cohort linked file includes 37,966 linked records representing 98.0 percent of the infant deaths to the 1986 birth cohort. After followup, records for some 779 infant deaths, or 2.0 percent of the deaths to the birth cohort, remained unlinked and are not included in the linked file data set. Documentation table 6 presents summary information about the unlinked death records not included in the linked file because they were not linked with their corresponding birth certificates. It is included for users who may want information about the total birth cohort of infant deaths. The table shows counts of unlinked records by race and age at death for each State of residence. The user is cautioned in using table 6 that the race and residence items are based on information reported at the time of death; whereas, tables 2-5 present data from the linked file in which the race and residence items are based on information reported at the time of birth. For more information, see discussions about race and residence in the Natality Technical Appendix (under the major heading Classification of Data and the subheadings Classification by occurrence and residence, Geographical classification, and Race or national origin) and about infant deaths in the Mortality Technical Appendix in this documentation (under the major heading Classification of Data and the subheading Infant deaths). Demographic and Medical Classification The documents listed below describe in detail the procedures employed for demographic classification on both the birth and death records and medical classification on death records. While not absolutely essential to the proper interpretation of the data for a number of general applications, these documents should nevertheless be studied carefully prior to any detailed analysis of demographic or medical (especially multiple cause) data variables. In particular, there are a number of exceptions to the ICD rules in multiple cause-of-death coding which, if not treated properly, may result in faulty analysis of the data. A. Manual of the International Statistical Classification of Diseases, Injuries, and Cause of Death, Ninth Revision (ICD-9) Volumes 1 and 2. B. NCHS Instruction Manual Data Preparation Part 2a, Vital Statistics Instructions for Classifying the Underlying Cause of Death, 1986. C. NCHS Instruction Manual Data Preparation, Part 2b, Vital Statistics Instructions for Classifying Multiple Cause of Death, 1986. D. NCHS Instruction Manual Data Preparation, Part 2c, Vital Statistics ICD-9 ACME Decision Tables for Classifying Underlying intro.doc - Page 5 Causes of Death, 1986. E. NCHS Instruction Manual Data Preparation, Part 2d, Vital Statistics NCHS Procedures for Mortality Medical Data System File Preparation and Maintenance, Effective 1979. F. NCHS Instruction Manual Data Tabulation, Part 2f, Vital Statistics ICD-9 TRANSAX Disease Reference Tables for Classifying Multiple Causes of Death, 1982-86. G. NCHS Instruction Manual Data Preparation, Part 3a, Vital Statistics Classification and Coding Instructions for Live Birth Records, 1986. H. NCHS Instruction Manual Data Preparation, Part 4, Vital Statistics Demographic Classification and Coding Instructions for Death Records, 1986. I. NCHS Instruction Manual Tabulation, Part 11, Vital Statistics Computer Edits for Mortality Data, Effective 1979. Volumes 1 and 2 of the ICD-9 may be purchased from World Health Organization Publication Center USA, 49 Sheridan Avenue, Albany, New York, 12210. The remaining documents may be requested from the Chief, Data Preparation Branch, Division of Data Processing, National Center for Health Statistics, P.O.Box 12214, Research Triangle Park, North Carolina 27709. In addition, the user should refer to the Technical Appendices of the Vital Statistics of the United States for information on the source of data, coding procedures, quality of the data, etc. Excerpts from the 1986 Natality Technical Appendix and from the 1986 and 1987 Mortality Technical Appendices are included in this documentation package. Cause-of-Death Data Mortality data are traditionally analyzed and published in terms of the underlying cause of death. Underlying cause-of-death data are coded and classified as described in the 1986 and 1987 Mortality Technical Appendices. NCHS has augmented underlying cause-of-death data with data on multiple causes reported on the death certificate. The linked file includes both underlying and multiple cause-of-death data. The multiple cause of death codes were developed with two objectives in mind. First, to facilitate etiological studies of the relationships among conditions, it was necessary to reflect accurately in coded form each condition and its location on the certification in the exact manner given by the certifier. Secondly, coding needed to be carried out in a manner by which the underlying cause of death could be assigned through computer applications. The approach was to suspend the linkage provisions of the ICD for the purpose of condition coding and code each entity with minimum regard to other conditions present on the certification. This general approach is hereafter called entity coding. intro.doc - Page 6 Unfortunately, the set of multiple cause codes produced by entity coding is not conducive to a third objective -- the generation of person-based multiple cause statistics. Person-based analysis requires that each condition be coded within the context of every other condition on the same certificate and modified or linked to such conditions as provided by ICD-9. By definition, the entity data cannot meet this requirement since the linkage provisions distort the character and placement of the information originally recorded by the certifying physician. Since the two objectives are incompatible, NCHS has chosen to create from the original set of entity codes a new code set called record axis multiple cause data. Essentially, the axis of classification has been converted from an entity basis to a record (or person) basis. The record axis codes are assigned in terms of the set of codes that best describe the overall medical certification portion of the death certificate. This translation is accomplished by a computer system called TRANSAX (TRANSLATION OF AXIS) through selective use of traditional linkage and modification rules for mortality coding. Underlying cause linkages which simply prefer one code over another for purposes of underlying cause selection are not included. Each entity code on the record is examined and modified or deleted as necessary to create a set of codes which are free of contradictions and are the most precise within the constraints of ICD-9 and medical information on the record. Repetitive codes are deleted. The process may (1) combine two entity axis categories together to a new category thereby eliminating a contradiction or standardizing the data; or (2) eliminate one category in favor of another to promote specificity of the data or resolve contradictions. The following examples from ICD-9 illustrate the effect of this translation: Case 1: When reported on the same record as separate entities, cirrhosis of liver and alcoholism are coded to 5715 (cirrhosis of liver without mention of alcohol) and 303 (alcohol dependence syndrome). Tabulation of records with 5715 would on the surface falsely imply that such records had no mention of alcohol. A preferable codification would be 5712 (alcoholic cirrhosis of liver) in lieu of both 5715 and 303. Case 2: If "gastric ulcer" and "bleeding gastric ulcer" are reported on a record they are coded to 5319 (gastric ulcer, unspecified as acute or chronic, without mention of hemorrhage or perforation) and 5314 (gastric ulcer, chronic or unspecified, with hemorrhage). A more concise codification would be to code 5314 only since the 5314 shows both the gastric ulcer and the bleeding. A. Entity Axis Codes The original conditions coded for selection of the underlying cause-of-death are reformatted and edited prior to creating the public-use tape. The following paragraphs describe the format and application of entity axis data. intro.doc - Page 7 FORMAT: Each entity-axis code is displayed as an overall seven byte code with subcomponents as follows: 1. Line indicator: The first byte represents the line of the certificate on which the code appears. Six lines (1-6) are allowable with the fourth and fifth denoting one or two written in "due to"s beyond the three lines provided in Part I of the U.S. standard death certificate. Line "6" represents Part II of the certificate. 2. Position indicator: The next byte indicates the position of the code on the line, i.e., it is the first (1), second (2), third (3),... eighth (8) code on the line. 3. Cause category: The next four bytes represent the ICD-9 cause code. 4. Nature of injury flag: ICD-9 uses the same series of numbers (800-999) to indicate nature of injury (N codes) and external cause codes (E codes). This flag distinguishes between the two with a one (1) representing nature of injury codes and a zero (0) representing all other cause codes. A maximum of 20 of these seven byte codes are captured on a record for multiple cause purposes. This may consist of a maximum of 8 codes on any given line with up to 20 codes distributed across three or more lines depending on where the subject conditions are located on the certificate. Codes may be omitted from one or more lines, e.g., line 1 with one or more codes, line 2 with no codes, line 3 with one or more codes. In writing out these codes, they are ordered as follows: line 1 first code, line 1 second code, etc. ----- line 2 first code, line 2 second code, etc. ----- line 3 ----- line 4 ----- line 5 ----- line 6. Any space remaining in the field is left blank. The specifics of locations are contained in the record layout given later in this document. EDIT: The original conditions are edited to remove invalid codes, reverify the coding of certain rare causes of death, and assure age/cause and sex/cause compatibility. Detailed information relating to the edit criteria and the sets of cause codes which are valid to underlying cause coding and multiple cause coding are provided in Part 11 of the NCHS Vital Statistics Instruction Manual Series. ENTITY AXIS APPLICATIONS: The entity axis multiple cause data is intro.doc - Page 8 appropriate to analyses which require that each condition be coded as a stand alone entity without linkage to other conditions and/or require information on the placement of such conditions in the certificate. Within this framework, the entity data are appropriate to the examination of etiological relationships among conditions, accuracy of certification reporting, and the validity of traditional assumptions in underlying cause selection. Additionally, the entity data provide in certain categories a more detailed code assignment which is linked out in the creation of record axis data. Where such detail is needed for a study, the user should selectively employ entity data. Finally, the researcher may not wish to be bound by the assumptions used in the axis translation process preferring rather to investigate hypotheses of his own predilection. By definition, the main limitation of entity axis data is that an entity code does not necessarily reflect the best code for a condition when considered within the context of the medical certification as a whole. As a result certain entity codes can be misleading or even contradict other codes in the record. For example, category 5750 is titled "Acute cholecystitis without mention of calculus". Within the framework of entity codes this is interpreted to mean that the codable entity itself contained no mention of calculus rather than that calculus was not mentioned anywhere on the record. Tabulation of records with a "5750" as a count of persons having acute cholecystitis without mention of calculus would therefore be erroneous. This illustrates the fact that under entity coding the ICD-9 titles cannot be taken literally. The user must study the rules for entity coding as they relate to his/her research prior to utilization of entity data. The user is further cautioned that the inclusion notes in ICD-9 which relate to modifying and combining categories are seldom applicable to entity coding (except where provided in Part 2b of the Vital Statistics Instruction Manual Series). In tabulating the entity axis data, one may count codes with the resultant tabulation of an individual code representing the number of times the disease(s) represented by the code appears in the file. In this kind of tabulation of morbid condition prevalence, the counts among categories may be added together to produce counts for groups of codes. Alternatively, subject to the limitations given above, one may count persons having mention of the disease represented by a code or codes. In this instance it is not correct to add counts for individual codes to create person counts for groups of codes. Since more than one code in the researcher's interest may appear together on the certificate, totaling must account for higher order interactions among codes. Up to 20 codes may be assigned on a record; therefore, a 20-way interaction is theoretically possible. All totaling must be based on mention of one or more of the categories under investigation. B. Record Axis Codes The following paragraphs describe the format and application of record-axis data. Part 2f of the Vital Statistics Instruction Manual intro.doc - Page 9 Series describes the TRANSAX process for creating record axis data from entity axis data. FORMAT: Each record (or person) axis code is displayed in five bytes. Location information is not relevant. The code consists of the following components: 1. Cause category: The first four bytes represent the ICD-9 cause code. 2. Nature of injury flag: The last byte contains a 0 or 1 with the 1 indicating that the cause is a nature of injury category. Again, a maximum of 20 codes are captured on a record for multiple cause purposes. The codes are written in a 80-byte field in ascending code number (5 bytes) order with any unused bytes left blank. EDIT: The record axis codes are edited for rare causes and age/cause and sex/cause compatibility. Likewise, individual code validity is checked. The valid code set for record axis coding is the same as that for entity coding. RECORD AXIS APPLICATIONS: The record axis multiple cause data set is the basis for NCHS core multiple cause tabulations. Location of codes is not relevant to this data set and conditions have been linked into the most meaningful categories for the certification. The most immediate consequence for the user is that the codes on the record already represent mention of a disease assignable to that particular ICD-9 category. This is in contrast to the entity code which is assigned each time such a disease is reported on two different lines of the certification. Secondly, the linkage implies that within the constraints of ICD-9 the most meaningful code has been assigned. The translation process creates for the user a data set which is edited for contradictions, duplicate codes, and imprecisions. In contrast to entity axis data, record axis data are classified in a manner comparable to underlying cause of death classification thereby facilitating joint analysis of these variables. Likewise, they are comparable to general morbidity coding where the linkage provisions of ICD-9 are usually utilized. A potential disadvantage of record axis data is that some detail is sacrificed in a number of the linkages. The user can take the record axis codes as literally representing the information conveyed in ICD-9 category titles. While knowledge of the rules for combining and linking and coding conditions is useful, it is not a prerequisite to meaningful analysis of the data as long as one is willing to accept the assumptions of the axis translation process. The user is cautioned, however, that due to special rules in mortality coding, not all linkage notes in ICD-9 are utilized. (See Part 2f of the Vital Statistics Instruction Manual Series.) The user should proceed with caution in using record axis data to count conditions as opposed to people with conditions since linkages have been invoked and duplicate codes have been eliminated. As with entity data, person based intro.doc - Page 10 tabulations which combine individual cause categories must take into account the possible interaction of up to 20 codes on a single certificate. In using the NCHS multiple cause data, the user is urged to review the information in this document and its references. The instructional material does change from year to year and revision to revision. The user is cautioned that coding of specific ICD-9 categories should be checked in the appropriate instruction manual. What may appear on the surface to be the correct code by ICD-9 may in fact not be correct as given in the instruction manuals. If on the surface it is not obvious whether entity axis or record axis data should be employed in a given application, detailed examination of Part 2f of the Vital Statistics Instruction Manual Series and its attachments will probably provide the necessary information to make a decision. It allows the user to determine the extent of the trade-offs between the two sets of data in terms of specific categories and the assumptions of axis translation. In certain situations, a combination of entity and record axis data may be the more appropriate alternative. intro.doc - Page 11