QUALITY OF DATA
Although vital statistics data are useful for a variety of administrative
and scientific purposes, they cannot be correctly interpreted unless various
qualifying factors and methods of classification are taken into account. The
factors to be considered depend on the specific purposes for which the data
are to be used. It is not feasible to discuss all the pertinent factors in
the use of vital statistics tabulations, but some of the more important ones
should be mentioned.
Most of the factors limiting the use of data rise from imperfections in
the original records or from the impracticability of tabulating these data in
very detailed categories. These limitations should not be ignored, but their
existence does not vitiate the value of the data for most general purposes.
Completeness of registration
An estimated 99.3 percent of all births occurring in the United States in
1985 were registered; for white births registration was 99.4 percent complete
and for all other births, 98.6 percent complete. These estimates are based on
the results of the 1964-68 test of birth-registration completeness according
to place of delivery (in or out of hospital) and race and on the 1985
proportions of births in these categories. The primary purpose of the test
was to obtain current measures of registration completeness for births in and
out of hospital by race on a national basis. Data for States were not
available as they had been from the previous birth-registration tests in 1940
and 1950. A detailed discussion of the method and results of the 1964-68
birth registration test is available (U.S. Bureau of the Census, 1973).
The 1964-68 test has provided an opportunity to revise the estimates of
birth-registration completeness for the years since the previous test in 1950
to reflect the improvement in registration. This has been done using
registration completeness figures from the two tests by place of delivery and
race. Estimates of registration completeness for four groups (based on place
of delivery and race) for 1951-65 were computed by interpolation between the
test results. (It was assumed that the data from the more recent test are for
1966, the midpoint of the test period.) The results of the 1964-68 test are
assumed to prevail for 1966 and later years. These estimates were used with
the proportions of births registered in these categories to obtain revised
numbers of births adjusted for underregistration for each year. The overall
percent of birth-registration completeness by race then was computed. The
figures for 1951-68 shown in table 1-21 differ slightly from those shown in
annual reports for years prior to 1969.
Data adjusted for underregistration for 1951-59 shown in tables 1-1, 1-3,
1-4, 1-6, and 1-8 have been revised to be consistent with the 1964-68 test
results and differ slightly from data shown in annual reports for years before
1969. For these years the published number of births and birth rates for both
racial groups have been revised slightly downward because the 1964-68 test
indicated that previous adjustments to registered births were slightly
inflated. Because registration completeness figures by age of mother and
by live-birth order are not available from the 1964-68 test, it must be
assumed that the relationships among these variables have not changed since
1950.
Discontinuation of adjustment for underregistration, 1960-Adjustment for
underregistration was discontinued in 1960, when birth registration for the
quanat85.doc - Page 1
United States was estimated to be 99.1 percent complete. This removed a bias
introduced into age-specific rates when adjusted births classified by age were
used. Age-specific rates are calculated by dividing the number of births to
an age group of mothers by the population of women in that age group. Tests
have shown that population figures are likely to be understated through census
undercounts; these errors compensate for underregistration of births.
Adjustment for underregistration of births, therefore, removes the
compensating effect of underenumeration, biasing the age-specific rates more
than when uncorrected birth and population data are used. (For further
details, see Vital statistics of the United States, 1963, volume I, page
4-11.)
The age-specific rates used in the cohort fertility tables (tables 1-12
through 1-19) are an exception to the above statement. These rates are
computed from births corrected for underregistration and population estimates
adjusted for underenumeration and misstatement of age. Adjusted births and
population estimates are used for the cohort rates because they are an
integral part of a series of rates, estimated with a consistent methodology.
It was considered desirable to maintain consistency with respect to cohort
rates, even though it means they will not be precisely comparable with other
rates shown for 5-year age groups.
Quality control procedures
States in the Vital Statistics Cooperation Program are required to have
an error rate of less than 2.0 percent for each item for 3 consecutive data
months during the initial qualifying period. Once a State is qualified, the
National Center for Health Statistics (NCHS) monitors the quality of data
received through independent verification of a sample of records to ensure
that the item error rate is not more than approximately 4 percent. In
addition, there is verification at the State level before NCHS is sent the
data.
After completion of coding, counts of the taped records are balanced
against control totals for each shipment of records from a registration areas.
Impossible codes are eliminated during the editing processes on the computer
and corrected on the basis of reference to the source record or adjusted by
arbitrary code assignment. All subsequent operations involved in tabulation
and table preparation are verified during the computer processing or by
statistical clerks.
Small frequencies
The number of births reported for an area represent complete counts. As
such, they are not subject to sampling error, although they are subject to
errors in the registration process. However, when the figures are used for
analytical purposes, such as the comparison rates over a time period or for
different areas, the number of events that actually occurred may be considered
one of a large series of possible results that could have arisen under the
same circumstances. The probable range of values may be estimated from the
actual figures according to certain statistical assumptions.
In general, distributions of vital events may be assumed to follow the
binomial distribution. Estimates of standard errors and tests of significance
under this assumption are described in most standard statistics texts. When
the number of events is large, the standard error, expressed as a percent of
quanat85.doc - Page 2
the number or rate, usually is small.
When the number of events is small (perhaps fewer than 100) and the
probability of such an event is small, considerable caution must be observed
in interpreting the conditions described by the figures. Events of rare
nature may be assumed to follow a Poisson probability distribution. For this
distribution, a simple approximation may be used to estimate the error as
follows:
If N is the number of births and R is the corresponding rate, the chances
are 19 in 20 that
1. The "true" number of events lies between
N - 2ûN and N + 2ûN
2. The "true" rate lies between
R - 2(R/ûN) and R + 2(R/ûN)
If the rate R corresponding to N events is compared with the rate S
corresponding to M events, the difference between the two rates may be
regarded as statistically significant if it exceeds
2 x [û of (R squared/N + S squared/M)]
For example, suppose that the observed birth rate for area A was 15.0 per
1000 population and that this rate was based on 50 recorded births. Given
prevailing conditions, the chances are 19 in 20 that the "true" or underlying
birth rate for that area lies between 10.8 and 19.2 per 1000 population. Let
it be further proposed that the birth rate for area A of 15.0 per 1000
population is being compared with a rate of 20.0 per 1000 population for area
B, which is based on 40 recorded births. Although the difference between the
rates for the two areas is 5.0, this difference is less than twice the
standard error of the difference
2 x [û of (15.0 squared/50 + 20.0 squared/40)]
of the two rates that is computed to be 7.6. From this, it is concluded that
the difference between the rates for the two areas is not statistically
significant.
quanat85.doc - Page 3