Chapter 7
QUALITY OF DATA
Although vital statistics data are useful for a variety of
administrative and scientific purposes, they cannot be correctly interpreted
unless various qualifying factors and methods of classification are taken
into account. The factors to be considered depend on the specific purposes
for which the data are to be used. It is not feasible to discuss all the
pertinent factors in the use of vital statistics tabulations, but some of
the more important ones should be mentioned.
Most of the factors limiting the use of data arise from imperfections
in the original records or from the impracticability of tabulating these
data in very detailed categories. These limitations should not be ignored,
but their existence does not vitiate the value of the data for most general
purposes.
Completeness of registration
An estimated 99.2 percent of all births occurring in the United States
in 1989 were registered; for white births registration was 99.4 percent
complete and for all other births, 98.5 percent complete. These estimates
are based on the results of the 1964-68 test of birth-registration
completeness according to place of delivery (in or out of hospital) and race
and on the 1989 proportions of births in these categories. The primary
purpose of the test was to obtain current measures of registration
completeness for births in and out of hospital by race on a national basis.
Data for States were not available as they had been from the previous
birth-registration tests in 1940 and 1950. A detailed discussion of the
method and results of the 1964-68 birth-registration test is available
(13).
The l964-68 test has provided an opportunity to revise the estimates
of birth-registration completeness for the years since the previous test in
1950 to reflect the improvement in registration. This has been done using
registration completeness figures from the two tests by place of delivery
and race. Estimates of registration completeness for four groups (based on
place of delivery and race) for 1951-65 were computed by interpolation
between the test results. (It was assumed that the data from the more recent
test are for 1966, the midpoint of the test period.) The results of the
1964-68 test are assumed to prevail for 1966 and later years. These
estimates were used with the proportions of births registered in these
categories to obtain revised numbers of births adjusted for
underregistration for each year. The overall percent of birth-registration
completeness by race was then computed. The figures for 1951-68 shown in
table 1-3 differ slightly from those shown in annual reports for years prior
to 1969.
Data adjusted for underregistration for 1951-59 shown in tables 1-1,
1-4, 1-5, 1-9, 1-10, and 1-11 have been revised to be consistent with the
1964-68 test results and differ slightly from data shown in annual reports
for years before 1969. For these years the published number of births and
birth rates for both racial groups have been revised slightly downward
because the 1964-68 test indicated that previous adjustments to registered
births were slightly inflated. Because registration completeness figures by
age of mother and by live-birth order are not available from the 1964-68
quanat89.doc - Page 1
test, it must be assumed that the relationships among these variables have
not changed since 1950.
Discontinuation of adjustment for underregistration, 1960-- Adjustment
for underregistration of births was discontinued in 1960 when birth
registration for the United States was estimated to be 99.1 percent
complete. This removed a bias introduced into age-specific rates when
adjusted births classified by age were used. Age-specific rates are
calculated by dividing the number of births to an age group of mothers by
the population of women in that age group. Tests have shown that population
figures are likely to be understated through census undercounts; these
errors compensate for underregistration of births. Adjustment for
underregistration of births, therefore, removes the compensating effect of
underenumeration, biasing the age-specific rates more than when uncorrected
birth and population data are used. (For further details see page 4-11 in
the Technical Appendix of volume I, Vital Statistics of the United States,
1963.)
The age-specific rates used in the cohort fertility tables (tables 1-15
through 1-22) are an exception to the above statement. These rates are
computed from births corrected for underregistration and population
estimates adjusted for underenumeration and misstatement of age. Adjusted
birth and population estimates are used for the cohort rates because they
are an integral part of a series of rates, estimated with a consistent
methodology. It was considered desirable to maintain consistency with
respect to the cohort rates, even though it means that they will not be
precisely comparable with other rates shown for 5-year age groups.
Completeness of reporting
Interpretation of these data must include evaluation of item
completeness. The percent "not stated" is one measure of the quality of the
data. Completeness of reporting varies among items and States. See table A
for the percent of birth records on which specified items were not stated.
Quality control procedures
States in the Vital Statistics Cooperative Program are required to have
an error rate of less than 2.0 percent for each item for 3 consecutive data
months during the initial qualifying period. Once a State is qualified,
NCHS monitors the quality of data received through independent verification
of a sample of records to ensure that the item error rate is not more than
approximately 4 percent. In addition, there is verification at the State
level before NCHS is sent the data.
After the coding is completed, counts of the taped records are balanced
against control totals for each shipment of records from a registration
area. Impossible codes are eliminated during the editing processes on the
computer and corrected on the basis of reference to the source record or
adjusted by arbitrary code assignment. All subsequent operations involved in
tabulation and table preparation are verified during computer processing or
by statistical clerks.
Small frequencies
The numbers of births reported for an area represent complete counts.
quanat89.doc - Page 2
As such, they are not subject to sampling error, although they are subject
to errors in the registration process. However, when the figures are used
for analytical purposes, such as the comparison of rates over a period of
time or for different areas, the number of events that actually occurred may
be considered as one of a large series of possible results that could have
arisen under the same circumstances. The probable range of values may be
estimated from the actual figures according to certain statistical
assumptions.
In general, distributions of vital events may be assumed to follow the
binomial distribution. Estimates of standard errors and tests of
significance under this assumption are described in most standard statistics
texts. When the number of events is large, the relative standard error,
expressed as a percent of the number or rate, is usually small.
When the number of events is small (fewer than 100) and the probability
of such an event is small, considerable caution must be observed in
interpreting the conditions described by the figures. Events of rare nature
may be assumed to follow a Poisson probability distribution. For this
distribution, a simple approximation may be used to estimate the error as
follows:
If N is the number of births and R is the corresponding rate, the chances
are 19 in 20 that
1. The "true" number of events lies between
N - 2ûN and N + 2ûN
2. The "true" rate lies between
R - 2(R/ûN) and R + 2(R/ûN)
If the rate R1 corresponding to N1 events is compared with the rate R2
corresponding to N2 events, the difference between the two rates may be
regarded as statistically significant at the 0.05 level if it exceeds
2 x [û of (R1 squared/N1 + R2 squared/N2)]
For example, suppose that the observed birth rate for area A was 15.0 per
1000 population and that this rate was based on 50 recorded births. Given
prevailing conditions, the chances are 19 in 20 that the "true" or underlying
birth rate for that area lies between 10.8 and 19.2 per 1000 population. Let
it be further proposed that the birth rate for area A of 15.0 per 1000
population is being compared with a rate of 20.0 per 1000 population for area
B, which is based on 40 recorded births. Although the difference between the
rates for the two areas is 5.0, this difference is less than twice the standard
error of the difference
2 x [û of (15.0 squared/50 + 20.0 squared/40)]
of the two rates that is computed to be 7.6. From this, it is concluded that
the difference between the rates for the two areas is not statistically
significant.
quanat89.doc - Page 3