SOURCE AND ACCURACY STATEMENT FOR THE (SIPP) 1990 LONGITUDINAL PANEL FILE DATA COLLECTION AND ESTIMATION Source of Data. The data were collected during the 1990 panel of the Survey of Income and Program Participation (SIPP). The SIPP universe is the noninstitutionalized resident population living in the United States. This population includes persons living in group quarters, such as dormitories, rooming houses, and religious group dwellings. Crew members of merchant vessels, Armed Forces personnel living in military barracks, and institutionalized persons, such as correctional facility inmates and nursing home residents, were not eligible to be in the survey. Also, United States citizens residing abroad were not eligible to be in the survey. Foreign visitors who work or attend school in this country and their families were eligible; all others were not eligible to be in the survey. With the exceptions noted above, persons who were at least 15 years of age at the time of the interview were eligible to be in the survey. The 1990 panel SIPP sample is located in 230 Primary Sampling Units (PSUs) each consisting of a county or a group of contiguous counties. Within these PSUs, expected clusters of 2 or 4 living quarters (LQs) were systematically selected from lists of addresses prepared for the 1980 decennial census to form the bulk of the sample. To account for LQs built within each of the sample areas after the 1980 census, a sample was drawn of permits issued for construction of residential LQs up until shortly before the beginning of the panel. In jurisdictions that do not issue building permits, small land areas were sampled and the LQs within were listed by field personnel and then subsampled. In addition, sample LQs were selected from supplemental frames that included LQs identified as missed in the 1980 census and group quarters. The 1990 panel differs from the other panels as a result of oversampling for low income. The oversample was constructed by taking a small subsample from the 1989 panel, and combining it with the 1990 panel. Variables such as race, ethnicity, and sex were used for the oversampling since low income data for 1989 panel households were unavailable. The 1989 panel subsample contains all black headed households, all hispanic headed households, all households with heads having no spouse present, living with relatives, and a random sample of all the other household types. The latter random sample was done in an attempt to avoid bias in the sample. At the time of the initial visit, the occupants of about 21,900 living quarters were interviewed. This accounts for approximately 77% of the living quarters originally designated for sample. Approximately 17% of the designated living quarters were found to be vacant, demolished, converted to nonresidential use, or otherwise ineligible for the survey. The remainder, approximately 1700 living quarters, were not interviewed because the occupants refused to be interviewed, could not be found at home, were temporarily absent, or were otherwise unavailable. Thus, occupants of about 93 percent of all eligible living quarters participated in the first interview of the survey. For later interviews, only original sample persons (those in Wave 1 sample households and interviewed in Wave 1) and persons living with them were eligible to be interviewed. With certain restrictions, original sample persons were to be followed even if they moved to a new address. When original sample persons moved without leaving a forwarding address or moved to extremely remote parts of the country and no telephone number was available, additional noninterviews resulted. Sample households within the panel are divided into four subsamples of nearly equal size. These subsamples are called rotation groups 1, 2, 3, or 4 and one rotation group is interviewed each month. Each household in the sample was scheduled to be interviewed at 4 month intervals over a period of roughly 2 2/3 years beginning in February 1990. The reference period for the questions is the 4-month period preceding the interview month. In general, one cycle of four interviews covering the entire sample, using the same questionnaire, is called a wave. The period covered by the 1990 longitudinal panel file consists of 32 interview months (eight interviews) conducted from February 1990 to September 1992. Data for up to 32 reference months are available for persons on the file. Specific months available depend on the person's rotation group and his/her sample entry or exit date. However, data from all four rotation groups (i.e., the full sample) are available only for reference months January 1990 through May 1992, inclusive. Also note that the availability of data on household composition begins with the first interview month of a rotation group. Table 1 indicates the reference months and interview months for the collection of data from each rotation group of the 1990 longitudinal panel file. For example, rotation group 2 was first interviewed in February 1990 and data for the reference months October 1989 through January 1990 were collected. This rotation group was interviewed for the eighth and last time in June 1992 to collect data for February 1992 through May 1992. Table 1 also shows that 1990 calendar year (90CY) data were collected in interview months February 1990 to April 1991 and that 1991 calendar year (91CY) data were collected exactly one year later. Data from all four rotation groups are available for each reference month of the 1990 and 1991 calendar years. For panel, 90CY and 91CY weighting procedures, a person was classified as interviewed or noninterviewed based on the following definitions. (Note that a person may be classified differently for calculating different weights). Interviewed sample persons (including children) were defined to be 1) those for whom self or proxy responses were obtained for each month of the appropriate longitudinal period or 2) those for whom self or proxy responses were obtained for the first month of the appropriate longitudinal period and for each subsequent month until they were known to have died or moved to an ineligible address (foreign living quarters, institutions, or military barracks). The months for which persons were deceased or residing in an ineligible address were identified on the file. Noninterviewed persons were defined to be those for whom neither self nor proxy responses were obtained for one or more months of the appropriate longitudinal period (but not because they were deceased or moved to an ineligible address). It is estimated that roughly 61,700 persons were initially designated in the sample. Approximately 58,000 persons were interviewed in wave 1; while the balance, residing in the 1700 living quarters not interviewed at wave 1 remained anonymous and became the initial source of person nonresponse in the weighting procedures. For the panel and 90CY weighting procedures, the eligible sample is considered to be all persons initially designated for sample. In the panel weighting procedure, approximately 43,700 persons were classified as interviewed with a person nonresponse rate of 29 percent. The 90CY weighting procedure classified about 49,600 persons as interviewed and had a person nonresponse rate of 20 percent. The longitudinal file contains approximately 69,400 persons in all. This includes the wave 1 interviewed persons and about 11,400 persons who entered survey households during the panel through births, marriages, and other reasons. Approximately one-half of the newcomers were considered eligible for the 91CY weighting procedure; increasing the eligible sample size to roughly 67,400 persons. The 91CY weighting procedure classified about 47,500 persons as interviewed with a person nonresponse rate of 30 percent. Some respondents did not respond to some of the questions; therefore, item nonresponse rates, especially for sensitive income and money related items, are higher than the person nonresponse rates given above. ESTIMATION In the estimation procedure described below, all persons classified as interviewed for a given longitudinal period, i.e., panel, 90CY or 91CY, are assigned positive weights for that period, while those classified as noninterviewed are assigned zero weights. Estimation of Person Characteristics. Essentially the same estimation procedure was used to derive each of the three sets of SIPP longitudinal person weights. Several stages of weight adjustments were involved. Each person received a base weight equal to the inverse of his/her probability of selection. A combining factor was applied to reduce the weight of both the 1989 subsample cases and the original 1990 sample cases in order to create weighted estimates using both samples combined. Two noninterview adjustment factors were applied. One adjusted the weights of interviewed persons in interviewed households to account for persons who were eligible for the sample but could not be interviewed at the first interview. The second was applied to compensate for person noninterviews occurring in subsequent interviews. Another factor was applied to each interviewed person's weight to account for the SIPP sample areas not having the same population distribution as the strata from which they were selected. An additional stage of adjustment to longitudinal person weights was performed to reduce the mean square error of the survey estimates. This was accomplished by bringing the sample estimates into agreement with monthly Current Population Survey (CPS) type estimates of the civilian (and some military) noninstitutional population of the United States by age, sex, race, Hispanic ethnicity, and householder/not householder status as of the specified control date. The control dates for the panel, 90CY, and 91CY weights were March 1, 1990, January 1, 1990, and January 1, 1991, respectively. The CPS estimates were themselves brought into agreement with estimates from the 1980 decennial census which have been adjusted to reflect births, deaths, immigration, emigration, and changes in the Armed Forces since 1980. Use of Person Weights. Users should be forewarned to apply the appropriate weights given on this file before attempting to calculate estimates. The weights vary between units due to the oversampling that took place, weighting adjustments, and following movers. If analysis is done for the general population without applying the appropriate weights, the results will be erroneous. Each person on the 1990 longitudinal panel file has three longitudinal person weights (some of which may be zero) for estimation of panel, 90CY and 91CY person characteristics and two longitudinal household factors to be used only for exploratory estimates of household and family characteristics. We strongly recommend that all nonexploratory analysis be confined to person analysis using the longitudinal person weights. For example, using 90CY person weights, one can estimate the number of persons receiving food stamps from January through March of 1990. Also, we recommend the use of longitudinal person weights for person characteristics based on household attributes. For example, using panel person weights, one can estimate the number of persons living in households which received food stamps during the period covered by the 1990 panel. This file was created for purposes of survey research and evaluation, and the Bureau of the Census will continue to examine the data, correcting and improving the computer processing and estimation procedures where appropriate. We welcome and appreciate any research on your part that will help us achieve this goal. All estimates may be divided into two broad categories: longitudinal and cross-sectional. Longitudinal estimates require that data records for each person be linked across interviews, where as cross-sectional estimates do not. For example, annual income estimates obtained by summing the 12 monthly income amounts for each person would require linking records and so would be longitudinal estimates. Because there is no linkage between interviews, cross-sectional estimates can combine data from different interviews only at the aggregate level. Longitudinal person weights were developed for longitudinal estimation, but may be used for cross-sectional estimation as well. However, note that wave files with cross-sectional weights are also produced for the SIPP. Because of the larger sample size available on the wave files, it is recommended that these files be used for cross-sectional estimation, if possible. In this section it is assumed that all four rotation groups are used for estimation. If an estimate covers a time period for which data from some rotation groups are unavailable, refer to the section "Adjusting Estimates Which Use Less Than the Full Sample." Some basic types of longitudinal and cross-sectional estimates which can be constructed using longitudinal person weights are described below in terms of estimated numbers. Of course, more complex estimates, such as percents, averages, ratios, etc., can be constructed from the estimated numbers. Longitudinal person weights can be used to construct the following types of longitudinal estimates: 1. The number of persons who have ever experienced a characteristic during a given time period. To construct such an estimate, use the longitudinal person weight (panel, 90CY or 91CY) for the shortest time period which covers the time period of interest, summing the weights over all persons who possessed the characteristic of interest at some point during the time period of interest. For example, to estimate the number of persons who ever received food stamps during the last six months of 1990 use the 90CY longitudinal person weight. 2. The amount of a characteristic accumulated by persons during a given time period. To construct such an estimate, use the longitudinal person weight for the shortest time period which covers the time period of interest. Then compute the product of the weight times the amount of the characteristic and sum this product over all appropriate persons. For example, to estimate the aggregate 1990 annual income of persons who were employed during all 12 months of the year use the 90CY longitudinal person weight. 3. The average number of consecutive months of possession of a characteristic (i.e., the average spell length for a characteristic) during a given time period. For example, one could estimate the average length of each spell of receiving food stamps during 1990. Also, one could estimate the average spell of unemployment that elapsed before a person found a new job. To construct such an estimate, first identify the persons who possessed the characteristic at some point during the time period of interest. Then, create two sums of these person's appropriate longitudinal weights: (1) sum the product of the weight times the number of months the spell lasted and (2) sum the weights only. Now, the estimated average spell length in months is given by (1) divided by (2). A person who experienced two spells during the time period of interest would be treated as two persons and appear twice in sums (1) and (2). An alternate method of calculating the average can be found in the section "Standard Error of a Mean or Aggregate." 4. The number of month-to-month changes in the status of a characteristic (i.e., number of transitions) summed over every set of two consecutive months during the time period of interest. To construct such an estimate, sum the appropriate longitudinal person weight each time a change is reported between two consecutive months during the time period of interest. For example, to estimate the number of persons who changed from receiving food stamps in July 1990 to not receiving in August 1990 add together the 90CY longitudinal person weights of each person who had such a change. To estimate the number of changes in monthly salary income during the third quarter of 1990 sum together the estimate of number of persons who made a change between July 1 and August 1, between August 1 and September 1, and between September 1 and October 1. Note that spell and transition estimates should be used with caution because of the biases that are associated with them. Sample persons tend to report the same status of a characteristic for all four months of a reference period. This tendency results in a bias toward reported spell lengths that are multiples of four months. This tendency also affects transition estimates in that, for many characteristics, the number of characteristics, the number of month-to-month transitions reported between the last month of one reference period and the first month of the next reference period are much greater than the number of reported transitions between any two months within a reference period. Additionally, spells extending before or after the time period of interest are cut off (censored) at the boundaries of the time period. If they are used in estimating average spell length, a downward bias will result. Also using longitudinal person weights one can construct the following type of cross-sectional estimate: 5. Monthly estimates of a characteristic averaged over a number of consecutive months. For example, one could estimate the monthly average number of food stamp recipients over the months July through December 1990. To construct such an estimate, first form an estimate for each month in the time period of interest. Use the longitudinal CY90 person weight, summing over all persons who possessed the characteristic of interest during the month of interest. Then, sum the monthly estimates and divide by the number of months. Estimation of Household Characteristics. The Census Bureau has not developed Household and family weights for longitudinal analysis. However, to facilitate exploratory research based upon the Census Bureau's provisional longitudinal household definition, two different longitudinal household weights, termed adjustment factor 1 and adjustment factor 2, were created for each longitudinal household each month. These factors were then assigned to every member of the longitudinal household each month. The primary difference between the factors is that for married-couple households adjustment factor 1 was derived jointly from the panel longitudinal person weights of the householder and spouse, while adjustment factor 2 was derived solely from the panel longitudinal person weight of the householder. For each month, five data fields are included on the longitudinal panel file to facilitate creation of household level estimates: (1) current household type, (2) key person, (3) other household member, (4) adjustment factor 1, (5) adjustment factor 2. Definitions of fields (1) through (3) as well as the provisional definitions of longitudinal household, original household, and successor household are provided below. In this section "month" refers to reference month unless stated otherwise. LONGITUDINAL HOUSEHOLD: A longitudinal household is a household which exists during at least one month, but which may continue to exist for more than one month. A longitudinal household continues from one month to the next, if it has the same householder (and spouse, if present in the household), and if it is the same household type, where household type is defined below. CURRENT HOUSEHOLD TYPE: Households are classified by type in the current month where household types are: (1) married-couple household, (2) other family household, male householder, (3) other family household, female householder, (4) non-family household, male householder, (5) non-family household, female householder. ORIGINAL HOUSEHOLD: A household existing at the beginning of the survey, i.e., a household which exists during the first interview month of the rotation group. SUCCESSOR HOUSEHOLD: A household which is not an original household but which does exist during at least one month as an off-shoot of an original household. A successor household must exist during at least one month succeeding the first interview month of the rotation group, and must have a key person (see definition below) who was a member of an original household. KEY PERSON: In married-couple longitudinal households both the householder and the householder's spouse are key persons. In all other types of longitudinal households, there is only one key person - the householder. In married-couple households at least one key person must have entered the sample at Wave 1. In all other household types, the key person must have entered the sample at Wave 1. OTHER HOUSEHOLD MEMBER: A person who, during a specific month, is a member of a longitudinal household but is not a key person. Adjustment factors 1 and 2 are presented in figure 1. In examining figure 1, keep the following principles in mind: Adjustment factors 1 and 2 are always derived from the panel longitudinal person weight(s) of an original householder (and/or key person). For every successor household, where the current month householder (and/or spouse) was a member of an original household, it is the householder (and/or spouse) of the original household who supplies the panel longitudinal person weight from which the adjustment factors are derived. Figure 1. Adjustment Factors for Longitudinal Household Estimates - 1990 Longitudinal Panel File SUCCESSOR HOUSEHOLDS Married Couple HHer entered HHer entered sample in sample in Wave 1 Wave 2+ Other KP Other KP Other KP Other KP entered entered entered entered sample in sample in sample in sample in Wave 1 Wave 2+ Wave 1 Wave 2+ AF1 first 1/2 first 1/2 first monthly monthly monthly value of value of value of AF1 AF1 AF1 Zero* AF2 first first monthly monthly value of value of AF2 AF2 Zero* Zero* Other HHer entered HHer entered sample in sample in Wave 1 Wave 2+ AF1 first monthly value of AF1 Zero* AF2 first monthly value of AF2 Zero* ORIGINAL HOUSEHOLDS Married Couple Other AF1 mean LPW of two key LPW of persons HHer AF2 LPW of LPW of HHer HHer * These cells are added for completeness. By definition, these are not successor households. AF1 = Adjustment factor 1; AF2 = Adjustment factor 2; LPW = Panel longitudinal person weight; Wave 2+ = Wave 2 or later wave HHer = Current month householder; KP = Current month key person Note: The situation where a successor household is formed by the merging of two Wave 1 households is not covered in figure 1. Original sample persons who move into another sample household cannot be linked to their original household and so are treated as if they entered the sample in Wave2+. Use of Household Weights. Adjustment factor 1, adjustment factor 2, and the related data fields are intended to provide the basis for exploratory household and family estimates. For example, by using adjustment factor fields for key persons (in married couple households, one key person must be selected) with additional variables, estimates pertaining to longitudinal households can be derived for statements equivalent to the following: "During the period from month 'A' to month 'B', there were 'C' households with characteristics 'D'." An example of such a statement would be: "During the period from January to December 1990, there were 'C' households which received food stamps for 10 or more months." All such estimates should be considered exploratory, because the adjustment factors do not explicitly take into account several possible sources of bias, including differential attrition from the sample, with the result that the estimates may, even as national estimates, be subject to substantial bias. The purpose of including these data fields on the longitudinal panel file is to facilitate analyses that may be useful in developing improved longitudinal household weights. Although the exploratory adjustment factors may be useful for other purposes, the Census Bureau intends that these factors be used for only this one purpose. Exploratory household (family) estimates can be formed using either adjustment factor 1 or adjustment factor 2. At present, there is insufficient evidence to recommend one factor over the other in any given situation. To form exploratory household (family) estimates, use the adjustment factor deemed appropriate, summing over all households (families) possessing the characteristic of interest. Note that both adjustment factors for a household will remain the same for each month the household exists. Therefore, the appropriate adjustment factor for a household can be taken from any month of a household's existence. Also, note that the adjustment factors assigned to each member of a household actually apply to the entire household. As an example of the use of these adjustment factors, suppose one had an independent estimate of the number of households which received food stamps for 10 months or more during 1990 and wanted to compare it to the SIPP estimate. To construct the SIPP estimate, first, using appropriate data fields (e.g., current household type, key person), identify all households which existed for exactly 10, 11, and 12 months during 1990; then sum adjustment factor 1 or adjustment factor 2 over all of the identified households which received food stamps for the appropriate time period. Adjusting Estimates Which Use Less Than the Full Sample. All four rotation groups of data are not available for reference months October through December 1989 and June through August 1992 (see table 1). If the time period of interest for a given estimate (of person or household characteristics) includes these months, the estimate may need to be adjusted in some way to account for the missing rotation groups. For longitudinal estimates (types 1-4) this adjustment factor equals four divided by the number of rotation groups contributing data. For example, if the time period of interest for a given estimate is December 1989, then data will be available only from rotation groups 2, 3, and 4. Therefore, a factor of 4/3 = 1.3333 will be applied. To estimate the number of persons ever unemployed in the fourth quarter of 1989, only data from rotation group 2 are available. Thus, a factor of 4/1 = 4 will be applied. Note that, if the given estimate is an average of monthly estimates (estimate type 5), then the number of rotation groups and the factor used will be determined independently for each month in the average and the adjusted monthly estimates will be averaged together in the usual way. For example, to estimate the average number of persons unemployed per month in the fourth quarter of 1989, the October, November, and December data will be multiplied by 4/1, 4/2, and 4/3 respectively before being summed together and divided by three. ACCURACY OF ESTIMATES SIPP estimates are based on a sample; they may differ somewhat from the figures that would have been obtained if a complete census had been taken using the same questionnaire, instructions, and enumerators. There are two types of errors possible in an estimate based on a sample survey: nonsampling and sampling. We are able to provide estimates of the magnitude of SIPP sampling error, but this is not true of nonsampling error. Found in the next sections are descriptions of sources of SIPP nonsampling error, followed by a discussion of sampling error, its estimation, and its use in data analysis. Note that estimates from this sample for individual states are subject to very high sampling errors and are not recommended. The state codes on the file are primarily of use for linking respondent characteristics with appropriate contextual variables (e.g., state-specific welfare criteria) and for tabulating data by user-defined groupings of states. Nonsampling Errors. Nonsampling errors can be attributed to many sources, e.g., inability to obtain information about all cases in the sample; definitional difficulties; differences in the interpretation of questions; inability or unwillingness on the part of the respondents to provide correct information; inability to recall information, errors made in the following: collection such as in recording or coding the data, processing the data, estimating values for missing data; biases resulting from the differing recall periods caused by the rotation pattern used; and undercoverage. Quality control and edit procedures were used to reduce errors made by respondents, coders and interviewers. More detailed discussions of the existence and control of nonsampling errors in the SIPP can be found in the SIPP Quality Profile. Undercoverage in SIPP results from missed living quarters and missed persons within sample households. It is known that undercoverage varies with age, race, and sex. Generally, undercoverage is larger for males than for females and larger for Blacks than for Nonblacks. Ratio estimation to independent age- race-sex population controls partially corrects for the bias due to survey undercoverage. However, biases exist in the estimates to the extent that persons in missed households or missed persons in interviewed households have characteristics different from those of interviewed persons in the same age-race-sex group. Further, the independent population controls used have not been adjusted for undercoverage in the decennial census. The Bureau has used complex techniques to adjust the weights for nonresponse. For an explanation of the techniques used, see the Nonresponse Adjustment Methods for Demographic Surveys at the U.S. Bureau of the Census, November 1988, Working paper 8823, by R. Singh and R. Petroni. An example of successfully avoiding bias can be found in "Current Nonresponse Research for the Survey of Income and Program Participation" (paper by Petroni, presented at the Second International Workshop on Household Survey Nonresponse, October 1991). Comparability with Other Estimates. Caution should be exercised when comparing data from this file with data from other SIPP publications or with data from other surveys. The comparability problems are caused by such sources as the seasonal patterns for many characteristics, different nonsampling errors, and different concepts and procedures. Refer to the SIPP Quality Profile for known differences with data from other sources and further discussion. Sampling Variability. Standard errors indicate the magnitude of the sampling error. They also partially measure the effect of some nonsampling errors in response and enumeration, but do not measure any systematic biases in the data. The standard errors for the most part measure the variations that occurred by chance because a sample rather than the entire population was surveyed. USES AND COMPUTATION OF STANDARD ERRORS Confidence Intervals. The sample estimate and its standard error enable one to construct confidence intervals, ranges that would include the average result of all possible samples with a known probability. For example, if all possible samples were selected, each of these being surveyed under essentially the same conditions and using the same sample design, and if an estimate and its standard error were calculated from each sample, then: 1. Approximately 68 percent of the intervals from one standard error below the estimate to one standard error above the estimate would include the average result of all possible samples. 2. Approximately 90 percent of the intervals from 1.6 standard errors below the estimate to 1.6 standard errors above the estimate would include the average result of all possible samples. 3. Approximately 95 percent of the intervals from two standard errors below the estimate to two standard errors above the estimate would include the average result of all possible samples. The average estimate derived from all possible samples is or is not contained in any particular computed interval. However, for a particular sample, one can say with a specified confidence that the average estimate derived from all possible samples is included in the confidence interval. Hypothesis Testing. Standard errors may also be used for hypothesis testing, a procedure for distinguishing between population characteristics using sample estimates. The most common types of hypotheses tested are 1) the population characteristics are identical versus 2) they are different. Tests may be performed at various levels of significance, where a level of significance is the probability of concluding that the characteristics are different when, in fact, they are identical. To perform the most common test, compute the difference XA - XB, where XA and XB are sample estimates of the characteristics of interest. A later section explains how to derive an estimate of the standard error of the difference XA - XB. Let that standard error be sDIFF. If XA - XB is between -1.6 times sDIFF and +1.6 times sDIFF, no conclusion about the characteristics is justified at the 10 percent significance level. If, on the other hand, XA - XB is smaller than -1.6 times sDIFF or larger than +1.6 times DIFF, the observed difference is significant at the 10 percent level. In this event, it is commonly accepted practice to say that the characteristics are different. We recommend that users report only those differences that are significant at the 10 percent level or better. Of course, sometimes this conclusion will be wrong. When the characteristics are, in fact, the same, there is a 10 percent chance of concluding that they are different. Note that as more tests are performed, more erroneous significant differences will occur. For example, at the 10 percent significance level, if 100 independent hypothesis tests are performed in which there are no real differences, it is likely that about 10 erroneous differences will occur. Therefore, the significance of any single test should be interpreted cautiously. Note Concerning Small Estimates and Small Differences. Because of the large standard errors involved, there is little chance that estimates will reveal useful information when computed on a base smaller than 200,000. Also, nonsampling error in one or more of the small number of cases providing the estimate can cause large relative error in that particular estimate. Therefore, care must be taken in the interpretation of small differences since even a small amount of nonsampling error can cause a borderline difference to appear significant or not, thus distorting a seemingly valid hypothesis test. Standard Error Parameters. Most SIPP estimates have greater standard errors than those obtained through a simple random sample because clusters of living quarters are sampled for the SIPP. To derive standard errors that would be applicable to a wide variety of estimates and could be prepared at a moderate cost, a number of approximations were required. Estimates with similar standard error behavior were grouped together and two parameters (denoted "a" and "b") were developed to approximate the standard error behavior of each group of estimates. Because the actual standard error behavior was not identical for all estimates within a group, the standard errors computed from these parameters provide an indication of the order of magnitude of the standard error for any specific estimate. These "a" and "b" parameters vary by characteristic and by demographic subgroup to which the estimate applies. Computation of Standard Error Parameters. In this section we discuss the adjustment of base "a" and "b" parameters to provide "a" and "b" parameters appropriate for each type of longitudinal and cross-sectional estimate described in the section "Use of Person Weights." Later sections will discuss the use of the adjusted parameters in various formulas to compute standard errors of estimated numbers, percents, averages, etc. Tables 4, 5 and 6 provide the base "a" and "b" parameters needed to compute the approximate standard errors for estimates using panel, 90CY and 91CY weights, respectively. (Users should be aware that these parameters are preliminary and may be revised in the future.) Table 7 provides additional factors to be used for averages of monthly cross-sectional estimates. These factors are needed for two reasons: the monthly estimates are correlated and averaging over a greater number of monthly estimates will produce an average with a smaller standard error. Table 8 gives correlations between quarterly and yearly averages of cross- sectional estimates. These correlations are used in the formula for the standard error of a difference (formula (11)). If household estimates have been produced using the adjustment factor 1 or adjustment factor 2, then follow the procedures described below, but use the household "a" and "b" parameters in table 4. The creation of appropriate "a" and "b" parameters for the previously discussed types of estimates are described below. Again, it is assumed that all four rotation groups are used in estimation. If not, refer to the section "Adjusting Standard Errors of Estimates Which Use Less Than the Full Sample." 1. The number of persons who have ever experienced a characteristic during a given time period. The appropriate "a" and "b" parameters are taken directly from table 4, 5 or 6. The choice of parameter depends on whether panel, 90CY or 91CY weights were used, on the characteristic of interest, and on the demographic subgroup of interest. 2. Amount of a characteristic accumulated by persons during a given time period. The appropriate "b" parameters are also taken directly from table 4, 5 or 6. 3. The average number of consecutive months of possession of a characteristic per spell (i.e., the average spell length for a characteristic) during a given time period. Start with the appropriate base "a" and "b" parameters from table 4, 5 or 6. The parameters are then inflated by an additional factor, g, to account for persons who experience multiple spells during the time period of interest. This factor is computed by: __n__ \ 2 /____ mi i=1 g = -------------- , (1) __n__ \ /____ mi i=1 where there are n persons with at least one spell and mi is the number of spells experienced by person i during the time period of interest. 4. The number of month-to-month changes in the status of a characteristic (i.e., number of transitions) summed over every set of two consecutive months during the time period of interest. Obtain a set of adjusted "a" and "b" parameters exactly as just described in 3, then multiply these parameters by an additional factor. Use 1.0000 if the time period of interest is two months and 2.0000 for a longer time period. (The factor of 2.0000 is based on the conservative assumption that each spell produces two transitions within the time period of interest.) 5. Monthly estimates of a characteristic averaged over a number of consecutive months. Appropriate base "a" and "b" parameters are taken from table 4, 5 or 6. If more than one longitudinal weight has been used in the monthly average, then there is a choice of parameters from tables 4, 5 and 6. Choose the table which gives the largest parameter. Next multiply the base "a" and "b" parameters by the factor from table 7 corresponding to the number of months in the average. Adjusting Standard Error Parameters for Estimates which Use Less Than the Full Sample. If some rotation groups are unavailable to contribute data to a given estimate, then the estimate and its standard error need to be adjusted. The adjustment of the estimate is described in a previous section. The standard error of a longitudinal estimates (types 1-4) is adjusted by multiplying the appropriate "a" and "b" parameters by a factor equal to four divided by the number of rotation groups contributing data to the estimate. Note that the parameters for the standard error of an average must still be adjusted according to this rule, even though the average itself is unaffected by the adjustment for missing rotation groups. For the standard error of cross-sectional estimates which cover only one month, the factor can be computed as just described or it can be taken from table 3 where the factor is given for each single reference month, October 1989 to August 1992. For the standard error of quarterly averages of monthly estimates which use less than the full sample, special factors are used, also given in table 3 for the fourth quarter of 1989 to the second quarter of 1992. As an example, suppose we want a standard error for the estimated number of females who have ever received food stamps during the fourth quarter of 1989. The appropriate "a" and "b" parameters are -0.0002050 and 18,329, respectively, (from table 4). Because only one rotation group is available for this estimate (see table 1), a factor of 4/1 = 4.000 would be applied to obtain final "a" and "b" parameters of -0.00008200 and 73,316, respectively. Suppose that instead, we were interested in the cross-sectional estimate of the average monthly number of female food stamp recipients for the fourth quarter of 1989. In that case a factor of 1.8519 (from table 3) would be applied to obtain final "a" and "b" parameters of -0.0003796 and 33,943, respectively. Note that only panel "a" and "b" parameters will be affected by this adjustment; no such adjustment is ever needed for CY90 and CY91 parameters since the full sample is available for all months in calendar years 1990 and 1991. Standard Errors of Estimated Numbers. The approximate standard error of an estimated number can be obtained by using formula (2): _________ | 2 Sx = \| ax + bx (2) Here x is the estimated number and "a" and "b" are the parameters associated with the particular type of characteristic for the appropriate longitudinal time period, i.e., panel, 90CY or 91CY. Illustration. Suppose the SIPP estimate of the number of persons ever receiving Social Security during the first three months of 1990 is 34,122,000. (This estimate is obtained using the CY90 weights.) The appropriate "a" and "b" parameters to use in calculating a standard error for the estimate are obtained from table 5. They are a = -0.0001077, b = 18,329, respectively. Using formula (2), the approximate standard error is _____________________________________________ | 2 \|(-0.0001077)(34,122,000) +(18,329)(34,122,000)=707,000 persons The 90-percent confidence interval as shown by the data is from 32,991,000 to 35,253,000. Therefore, a conclusion that the average estimate derived from all possible samples lies within a range computed in this way would be correct for roughly 90 percent of all samples. Similarly, using twice the standard error, we could conclude that the average estimate derived from all possible samples lies within the interval 32,708,000 to 35,536,000 with 95 percent confidence. Standard Error of a Mean or Aggregate. A mean is defined here to be the average quantity of some characteristic (other than the number of persons, families, or households) per person, family, or household. An aggregate is defined to be the total quantity of some characteristic summed over all units in a subpopulation. For example, a mean could be the average annual income of females age 25 to 34; an aggregate, the total annual income for that subpopulation. The standard error of a mean can be approximated by formula (3) below and the standard error of an aggregate can be approximated by formula (4). Because of the approximations used in developing formulas (3) and (4), an estimate of the standard error of the mean or aggregate obtained from these formulas will generally underestimate the true standard error. _ The formula used to estimate the standard error of a mean, x, is _______ _ | 2 Sx = \|(b/y)s , (3) 2 where y is the base, s is the estimated population variance of the characteristic and b is the "b" parameter associated with the particular type of characteristic. The standard error of an aggregate k is estimated by: _________ | 2 sk = \| b y s. (4) 2 The population variance, s, may be estimated by one of two methods: the first method uses data that has been grouped into intervals, the second method uses ungrouped data. The second method is recommended because it is more precise. However, the first method will be easier to implement if grouped data is already being used as part of the analysis. In both methods it is assumed xi is the value of the characteristic for person i. To use the first method, the range of values for the characteristic is divided into c intervals, where the lower and upper boundaries of interval j are Zj-1 and Zj, respectively. Each person is placed into one of the c groups such that the value of the characteristic is between Zj-1 and Zj. The 2 estimated population variance, s, is then given by: __c__ 2 \ 2 _2 s = /____ pjmj - x , (5) j=1 where pj is the estimated proportion of persons in group j (based on weighted data), and mj = (Zj-1 + Zj) / 2. The most representative value of the characteristic in group j is assumed to be mj. If group c is open-ended, i.e., no upper interval boundary exists, then an approximate value for mc is mc = 3/2 Zc-1. _ The mean, x, can be obtained using the following formula: __c__ _ \ x = /____ pjmj. (6) j=1 In the second method, the estimated population variance is given by __n__ \ 2 /____ wixi 2 i=1 _2 s = --------------- - x , (7) __n__ \ /____ wi i=1 where there are n sample persons with the characteristic of interest and wi is the final weight for persons i (note _____ \ _ that /____ wi = y). The mean, x , can be obtained from the formula __n__ \ /____ wixi _ i=1 x = ------------ . (8) __n__ \ /____ wi i=1 Illustration of Method 1. Suppose that the 1990 distribution of annual incomes is given in table 2 for persons aged 25 to 34 who were employed for all 12 months of 1990. The mean annual cash income from formula (6) is _ x = (1,371/39,851)(2,500) + (1,651/39,851)(6,250) +...+ (1,493/39,851)(105,000) = $26,717. Using formula (5) and the mean annual cash income of $26,717 the 2 estimated population variance, s, is 2 2 2 s = (1,371/39,851)(2,500) + (1,651/39,851)(6,250) +...+ 2 2 (1,493/39,851)(105,000) - (26,717) = 468,331,633. The appropriate "b" parameter from table 5 is 5,597. Now, using formula (3), the estimated standard error of the mean is _ _______________________________ sx = \|(5,597/39,851,000)(468,331,633) = $256. Illustration of Method 2. Suppose that we are interested in estimating the average length of spells of food stamp recipiency during the calendar year 1990 for a given subpopulation. Also, suppose there are only 10 sample persons in the subpopulation who were food stamp recipients. (This example is for illustrative purposes only; actually, 10 sample cases would be too few for a reliable estimate.) The number of consecutive months of food stamp recipiency during 1990 and the 90CY weights are given below for each sample person: Sample Spell Length Final Person (in months) Weight 1 4,3 5,300 2 5 7,100 3 9 4,900 4 3,3,2 6,500 5 12 9,200 6 12 5,900 7 4,1 7,600 8 7 4,200 9 6 5,500 10 4 5,700 Using formula (8), the average spell of food stamp recipiency is estimated to be _ (5300)(4) + (5300)(3) + ... + (5700)(4) x = --------------------------------------- 5300 + 5300 + ... + 5700 = 473,100/87,800 = 5.4 months The standard error will be computed by formula (3). First, the estimated population variance can be obtained by formula (7): 2 2 2 2 (5300)(4) + (5300)(3) + ... + (5700)(4) 2 s = --------------------------------------- - (5.4) 5300 + 5300 + ... + 5700 2 = 12.4 (months) Next, the base "b" parameter of 16,418 is taken from table 5 and multiplied by the factor computed from formula (1): 2 2 2 2 + 1 + 1 + 3 + 1 + 1 + 2 + 1 + 1 + 1 g = ------------------------------------- 2 + 1 + 1 + 3 + 1 + 1 + 2 + 1 + 1 + 1 = 1.71 Therefore, the final "b" parameter is 28,075 and the standard error of the mean is ___________________________________ s = \|(28,075/87,800) (12.4) = 2.0 months Standard Errors of Estimated Percentages. This section refers to the percentages of a group of persons, families, or households possessing a particular attribute and to percentages of money or related concepts. The reliability of an estimated percentage, computed using sample data for both numerator and denominator, depends upon both the size of the percentage and the size of the total upon which the percentage is based. Estimated percentages are relatively more reliable than the corresponding estimates of the numerators of the percentages, particularly if the percentages are over 50 percent. For example, the percent of employed persons is more reliable than the estimated number of employed persons. When the numerator and denominator of the percentage have different parameters, use the parameter of the numerator. If proportions are presented instead of percentages, note that the standard error of a proportion is equal to the standard error of the corresponding percentage divided by 100. There are two types of percentages commonly estimated. The first type is the percentage of persons sharing a particular characteristic such as the percentage of persons owning their own home or the percentage of January food stamp recipients who were also receiving food stamps in July. The second type is the percentage of money or some similar concept held by a particular group of persons or held in a particular form. Examples are the percentage of wealth held by persons with high income and the percentage of annual income received by females. For the percentage of persons, the approximate standard error, s(x,p), of the estimated percentage, p, can be obtained by the formula: _________________ S(x,p) = \| (b/x)(p)(100-p) (9) Here x is the base of the percentage, p is the percentage (0