SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES 12. Using the 1990-1993 Full Panel Longitudinal Research Files This chapter specifically discusses procedures for working with the 1990 through 1993 Panels full panel longitudinal research files of the Survey of Income and Program Participation (SIPP). Starting with the 1996 Panel, SIPP no longer created a research file or a longitudinally edited full panel file. The chapter begins by describing the documentation that accompanies the full panel public use files for the 1990 through the 1993 Panels obtained from the Census Bureau. The discussion then turns to the data files themselves. The data file structure is described, and detailed explanations are provided about how to use the longitudinal research files when performing common tasks, including:   Realigning the data by calendar month;   Using the monthly interview status variables;   Identifying persons, households, families, and program units;   Working with the unearned income data;   Understanding the effects of topcoding;   Using imputation flags; and   Identifying states and metropolitan areas. Before reading this chapter, users should read Chapter 9 for an introduction to Section II. Analysts using only one longitudinal research file should also read about the use of sample weights (Chapter 8) and the computation of standard errors (Chapter 7). Those planning on merging data from a longitudinal research file to data from the core wave or topical module files should read Chapter 10 for information about the core wave files, Chapter 11 for information about the topical module files, and Chapter 13 for information about linking SIPP public use files. This chapter focuses on the longitudinal research files pre96 panels. It is written so that it can be used independently of the chapters describing the core wave files and topical module files. Although there are many similarities across the three types of files, important differences do exist. Because those differences are sometimes subtle, users familiar with the core wave and topical module files should read this chapter carefully, paying close attention to information about variable names and file structures. Table 9-2 in chapter 9 summarizes the differences between the core wave, topical module, and longitudinal research files. 12 - 1 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES Using the Technical Documentation of the 1990-1993 Longitudinal Research Files Each data file received from the Census Bureau comes with a set of technical documentation and a data dictionary. The technical documentation includes: ! The paper survey instrument; ! A glossary of selected terms; ! A cross-walk, mapping reference months into calendar months for each rotation group; ! A source and accuracy statement describing the sample weights and the computation of standard errors; and ! User Notes The survey instrument is vital to understanding what questions were asked, how they were asked, the order in which they were asked, to whom they were asked, and the way in which the answers were recorded. Some questions employ skip patterns (Chapter 3), so users should pay particular attention to which questions were skipped for which respondents. These skip patterns are best understood by consulting the survey instruments.1 The source and accuracy statements provide information about the weights on the files, when and how to make adjustments to the weights, and one approach to computing standard errors for some common types of estimates. More detailed discussions of those topics are provided in Chapters 7 and 8 of this Guide. The data dictionary provides a detailed description of each variable on the file. It describes four aspects of each variable: 1. The definition; 2. The sample universe of the corresponding survey question; 3. The ranges for all legal values; and 4. The location (and size) in the file. A machine-readable version of the data dictionary accompanies each data file. It can also be downloaded from the Internet (http://www.sipp.census.gov/sipp/). 1 With the introduction of CAI (computer-assisted interviewing) in the 1996 Panel, questionnaire documentation is now available at the SIPP Web site at http://www.sipp.census.gov/sipp/. 12 - 2 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES The data dictionary is formatted to facilitate processing by user-written computer programs.2 As shown in Figure 12-1, a “D” in the first column signifies that the next few lines define the variable: (1) the variable name, (2) the total number of columns occupied by the variable, (3) the starting position, (4) the number of occurrences of that variable, and (5) the size of each occurrence of the variable.3 A “U” in the first column indicates that the next words describe the universe. 4 A “V” in the first column indicates that the next number and phrase describe one of the values of the variable. An asterisk in the first column denotes a comment. A period (.) before a word denotes the start of the value label.5 The format of the data dictionary for the longitudinal research files is different from that used for the core wave and topical module files. The full panel data dictionary includes two extra fields on the line with a “D” in the first column. The first extra field contains the number of occurrences of the variable, and the second extra field contains the number of digits for each occurrence of the variable. These fields are needed because some variables in the longitudinal research file occur x times, depending on the number of waves, or y times, depending on the number of months in the panel. HH-ADDID in Figure 12-1 is a monthly variable containing two digits (monthly because it occurs 36 times). PP-MIS is also a monthly variable, but its length is one digit. PP-INTVW appears once per wave (because it occurs nine times), and PP-ENTRY, PP-PNUM, SU-TOTPP, and PP-RCSEQ occur once for the entire panel. 2 The data dictionaries for the longitudinal research files use a different format from that used for the core wave and topical module files. Users who have worked with the core wave and topical module files should take care to note those differences. In addition, the formats of the data dictionaries for the 1996 Panel core wave and topical module files, as well as the variable names used in those files, have changed in the 1996 Panel. This chapter uses variable names from the 1990-1993 SIPP Panels. 3 The data dictionary for the 1992 longitudinal research file used a different format from that used in the other longitudinal research files. In the 1992 data dictionary, the first line for each new variable, labeled with a “D” in column 1, has the following fields: variable name, total size (number of characters), start location, the length of a single occurrence of the variable, the number of occurrences of the variable, and the number of implied decimals. 4 The universe definitions included in the data dictionaries were often inaccurate. Users of these files should check the skip patterns in the actual survey questionnaire to determine which subset of respondents was asked each question. 5 The data dictionary for the 1992 longitudinal research file also has a line labeled with an “R” in column 1. This line provides the range of values for the variable 12 - 3 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES Figure 12-1. Excerpt from the 1993 Longitudinal Meta Data Dictionary D PP-ENTRY 2 17 1 2 Range=(11:99) Edited entry address ID Address ID of the household that this person belonged to at the time this person first became part of the sample D PP-PNUM 3 19 1 3 Range=(101:999) Edited person number D SU-TOTPP 2 22 1 2 Range=(1:60) Total number of person records for this sample unit D PP-RCSEQ 2 24 1 2 Range=(1:60) Sequence number of person record within sample unit D HH-ADDID 72 26 36 2 Range=(0:99) Address ID. --This field identifies the household this person lived in this month D PP-INTVW 9 98 9 1 Range=(0:4) Person's interview status for the relevant interview V 0. Not applicable (children under.15), not in sample, nonmatch V 1. Interview (self) V 2. Interview (proxy) V 3. Noninterview-Type Z refusal V 4. Noninterview-Type Z other D PP-MIS 36 107 36 1 Range=(0:2) Person's interview status for this month V 0. Not matched or not in sample V 1. Interview V 2. Non-interview Relationship of the Longitudinal Research Data Files to the SIPP Survey Instrument The data dictionaries for the longitudinal research files do not replicate the survey instruments. Analysts should keep a few things in mind when using the data: 12 - 4 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES ! The variables on the longitudinal research files do not correspond one-to-one with the questionnaire items. The variables are listed in a different order, some are not included in the longitudinal research file at all, and some are created from a combination of other variables. ! The range of possible values of the variables does not always correspond one-to-one with the response categories shown on the survey instrument or in the data dictionary; ! The variable name may not readily indicate its meaning; and ! The complexity of the skip patterns may not be apparent just by looking at the data dictionary.6 To avoid potential problems and confusion, users should become familiar with the survey instrument before using the data. When working with the data, analysts should refer to both the survey instrument and the data dictionary. Structure of the Longitudinal Research Files The longitudinal research files contain one record for each person who was ever in the SIPP sample for that panel. Even if the person was in the sample for just 1 month, there will be a record for that person. There are records for children as well as for adults, and there are records for people who entered the sample after the first wave. Within each record, the variables correspond to the information that was collected in the core interviews. While most of the core items are included in the longitudinal research files, some items are not, and not all of the constructed variables found on the core wave files are included on the longitudinal research files. In addition, no items from any of the topical modules are included on the longitudinal research files. When items from the core wave or topical module files are needed, those variables must be merged with data from the longitudinal research files. Chapter 13 provides a detailed discussion of merging SIPP files. The longitudinal research file structure differs from that of the core wave files. The longitudinal research files contain just one record per person, while the core wave files contain one record per person per month. Because some attributes do not change over the course of the panel, those variables appear once on each record (e.g., rotation group, sample unit ID, person number, sex, race, and ethnic origin). Some questions were asked once during each wave, so they appear x times on each record, where x equals the number of waves for that panel (e.g., highest grade attended, and participation in school breakfast and lunch programs). Most of the core questions were asked for each month of the panel. They appear y times on each record, where y equals the number of months for that panel (e.g., current address ID, monthly interview status, relationship to the reference person, income, and program participation). Table 12-1 shows that the 1992 Panel has 10 waves (or 40 months) of data. The 1993 Panel has nine waves (or 36 months) of data. Thus, the interview status variable (PP-MIS) appears 40 times in the 6 See footnote 4. 12 - 5 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES 1992 longitudinal research file, and it appears 36 times in the 1993 longitudinal research file. Table 12-2 illustrates the longitudinal research file structure. In this example, there are five people. Sample unit ID (PP-ID), person number (PP-PNUM), and entry address ID (PP-ENTRY) appear once on each record because they are permanent characteristics of those people. Monthly interview status (PP-MIS), a monthly variable, appears 40 times because the 1992 Panel had 10 waves and each wave collected information about the 4 months prior to the interview month. Table 12-1. Summary of Panels, Waves, Reference Months, and Sample Sizes Wave 1 Panel Reference Months Number of Waves Eligible Months Households Year 1984 Jun. 83 - Jun. 86 9 36 20,897 1985 Oct. 84 - Jul. 87 8 32 14,306 1986 Oct. 85 - Mar. 88 7 28 12,425 1987 Oct. 86 - Apr. 89 7 28 12,527 1988 Oct. 87 - Dec. 89 6 24 12,725 1989 Oct. 88 - Dec. 89 3 There is no longitudinal research file for the 1989 SIPP. 1990 Oct. 89 - Aug. 92 8 32 23,627 1991 Oct. 90 - Aug. 93 8 32 15,626 1992 Oct. 91 - Mar. 95 10 40 21,577 1993 Oct. 92 - Dec. 95 9 36 21,823 1996 Dec.95 - Feb. 00 12 48 40,188 2001 Oct. 00 - Feb 04 12 48 50,745 2004 Oct. 03 - Feb 08 12 48 62,692 2008 May 08 - Feb 13 13 60 65,461 Source: SIPP Quality Profile, 3rd Ed. (U.S. Census Bureau, 1998a). People who were not interviewed (in person or by proxy) for 1 or more months over the course of the panel either have their data imputed7 or are identified as not in the sample (PP-MIS equal to either 0 or 2) for the months when they were not in the sample. The discussion of the PP-MIS variable later in this chapter provides additional information. 7 Imputation would be by Type Z and missing-wave imputations. Chapter 4 discusses imputation methods 12 - 6 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES Table 12-2. Example of the Longitudinal Research File Structure PP-MIS PP-ID PP- PP- PP- Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 ENTRY NUM ROT Month Month Month Month Month 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 112612345 11 101 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 112987122 11 101 2 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 987913389 11 101 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 123912879 11 101 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 123912879 11 201 3 0 0 0 0 0 1 1 1 1 1 1 1 2 2 1 1 1 1 1 0 874943283 11 101 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 788723892 11 101 4 1 1 1 0 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 788723892 11 102 4 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 0 0 0 0 788723892 11 301 4 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 788723892 11 1001 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 763483873 11 101 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 890987123 11 101 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 2 PP-MIS PP-ID PP- PP- PP- Wave 6 Wave 7 Wave 8 Wave 9 Wave 10 ENTRY NUM ROT Month Month Month Month Month 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 112612345 11 101 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 112987122 11 101 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 987913389 11 101 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 123912879 11 101 3 2 1 1 1 0 0 2 2 2 0 0 0 0 0 0 0 0 0 0 0 123912879 11 201 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 874943283 11 101 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 12 - 7 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES How to Align Data by Calendar Month It is frequently useful to realign the SIPP data by calendar month instead of reference month. For example, researchers often want to analyze data for a specific calendar year (January through December) or federal fiscal year (October through September).8 To do this, the analyst must know the reference period for each rotation group of the panel. That information is included with the technical documentation that accompanies the longitudinal research files. Table 12-3 shows the reference period for each rotation group of the 1992 Panel. It shows that the reference period for rotation, group 2, is October 1991 - January 1995. The reference period for rotation group 3 is November 1991- February 1995. The reference period for rotation group 4 is December 1991-March 1995. The reference period for rotation group 1 is January 1992 - December 1994 (interviews were not conducted in Wave 10 for this rotation group). Table 12-3. Reference Periods for Each Rotation Group of the 1992 Panel Rotation Group (ROT) Reference Period 2 October 1991-January 1995 3 November 1991-February 1995 4 December 1991-March 1995 1 January 1992-December 1994 The following algorithm (Figure 12-3), written for the 1992 Panel, illustrates one approach to realigning the SIPP reference months to common calendar months. The mapping depends on the panel and rotation group and must be applied to each person. The first step establishes the displacement or realignment of the months. The second step initializes each monthly variable to -9 to distinguish the calendar months in which the variable is not relevant.9 The loop goes from 1 to 42 because in the 1992 Panel the first reference month was October 1991 and the last reference month was March 1995, which means that there were 42 calendar months covered by the panel. The third part of the algorithm realigns the input data to be based on the calendar month. Table 12-4 displays the data after the realignment. 8 The longitudinal research files do not contain calendar month weights. Those weights would be needed for some types of longitudinal analyses, such as analyses of the dynamics of program participation, where the unit of analysis is a spell of program participation (Chapter 8 provides a discussion of this example). Data from the longitudinal research files can also be used for cross-sectional estimation, and they are often preferable to the data from the core wave files because the edit and imputation procedures used for the longitudinal research files are believed to result in less imputation error than the procedures used for the core wave files. The format of the file is sometimes easier to work with, even for cross-sectional applications. In those instances, the calendar month weights must be merged from the core wave files. Chapter 8 provides a detailed discussion of weighting procedures in the SIPP. Chapter 13 provides a detailed discussion of linking SIPP files 9 If - 9 is a possible value for the variables being realigned (e.g., self-employed income can be negative), a different starting value must be used. 12 - 8 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES Using the Monthly Interview Status (PP-MIS) Variables The monthly interview status variable helps to determine whether the data for a person in a given month should be used. In the longitudinal research files, this variable is labeled PP-MIS, and it has one occurrence for each reference month of the SIPP panel. Some people refer to it as the in sample variable to distinguish it from the interview status variable (PP-INTVW). The PP-MIS variables have three possible values: 0, 1, and 2. Figure 12-3. Algorithm for Realigning SIPP Panel Month to Calendar Months in the 1992 Panel /* Create a variable that identifies the number of months each rotation group differences from the baseline */ If ROT = 2 DISPLACMENT = 0 Else if ROT= 3 DISPLACEMENT = 1 Else if ROT=4 DISCPLACEMENT = 2 Else if ROT=1 DISCPLACEMENT = 3 End if /* Initialize the new, re-aligned variable. This is not needed in SAS. When this step is used, an initial value should be chosen that is not a legal value for the variable in the actual data. */ For each calendar month (for CALMM = 1 to 42): NEW-PP-MIS(CALMM) = -9 End loop /* Create the newly re-aligned variable */ For each reference month (for MONTH = 1 to 40): CALMM = MONTH + DISPLACEMENT NEW-PP-MIS(CALMM) = PP-MIS(MONTH) End loop The monthly interview status is the only reliable guide to whether the data for a given person should be used in a given month. Analysts should use only data for those months in which a person’s interview status (PP-MIS) is equal to 1.10 10 As a safeguard against inadvertently using data for months when PP-MIS is not equal to 1, all monthly variables in the user’s data extract should be set to a missing value for months when PP-MIS is not equal to 1. Most statistical packages allow certain values to used flagged as “missing”. Once flagged, those values are excluded from computations. 12 - 9 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES Table 12-4 Monthly Data from the 1992 Panel, Realigned by Calendar Month NEW-PP-MIS PP-ID PP- PP- PP-ROT 1991 1992 ENTRY NUM Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 112612345 11 101 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 112987122 11 101 2 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 987913389 11 101 3 -9 1 1 1 1 1 1 1 1 1 1 1 1 1 1 123912879 11 101 3 -9 1 1 1 1 1 1 1 1 1 1 1 1 1 1 123912879 11 201 3 -9 0 0 0 0 0 1 1 1 1 1 1 1 2 2 874943283 11 101 4 -9 -9 1 1 1 1 1 1 1 1 1 1 1 1 1 788723892 11 101 4 -9 -9 1 1 1 0 0 1 1 1 1 1 1 1 0 788723892 11 102 4 -9 -9 1 1 1 1 1 1 1 1 1 1 1 1 2 788723892 11 301 4 -9 -9 0 0 0 0 1 1 1 1 1 1 1 1 1 788723892 11 1001 4 -9 -9 0 0 0 0 0 0 0 0 0 0 0 0 0 763483873 11 101 1 -9 -9 -9 1 1 1 1 1 1 1 1 1 1 1 1 890987123 11 101 1 -9 -9 -9 1 1 1 1 1 1 1 1 1 2 2 2 New-PP-MIS PP-ID PP- PP- PP- 1993 ENTRY NUM ROT Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 112612345 11 101 2 1 1 1 1 1 1 1 1 1 1 1 1 112987122 11 101 2 0 0 0 0 0 0 0 0 0 0 0 0 987913389 11 101 3 1 1 1 1 1 1 1 1 1 1 1 1 123912879 11 101 3 1 1 1 1 1 2 2 1 1 1 0 1 123912879 11 201 3 1 1 1 1 1 0 0 0 0 0 0 2 874943283 11 101 4 1 1 1 1 1 1 1 1 1 1 1 1 788723892 11 101 4 0 1 1 1 1 1 1 1 1 1 1 9 788723892 11 102 4 2 2 2 0 0 0 0 0 0 0 0 2 788723892 11 301 4 1 1 1 1 1 1 1 1 1 1 1 1 788723892 11 1001 4 0 0 0 0 0 0 0 0 0 0 0 0 763483873 11 101 1 1 1 1 1 1 1 1 1 1 1 1 1 890987123 11 101 1 1 1 1 1 1 1 1 1 1 2 2 2 12 - 10 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES (table continues) Table 12-4. Monthly Data from the 1992 Panel, Realigned by Calendar Month (continued) NEW-PP-MIS 1994 1995 PP-ID PP- PP- PP- Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar ENTRY PNUM ROT 112612345 11 101 2 1 1 1 1 1 1 1 1 1 1 1 1 1 -9 -9 112987122 11 101 2 0 0 0 0 0 0 0 0 0 0 0 0 0 -9 -9 987913389 11 101 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -9 123912879 11 101 3 2 2 2 0 0 0 0 0 0 0 0 0 0 0 -9 123912879 11 201 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -9 874943283 11 101 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 788723892 11 101 4 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 788723892 11 102 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 788723892 11 301 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 788723892 11 1001 4 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 763483873 11 101 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 890987123 11 101 1 1 1 1 1 1 1 2 2 2 1 1 1 0 0 0 12 - 11 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES Any data present for months in which a person’s interview status is coded either 0 or 2 should be ignored. A code of 0 indicates that the person was not in the sample that month, and a code of 2 indicates a noninterview for that month.11 The presence of data in analysis fields for any given month is not a reliable guide to whether the person should be included in the planned analyses. Data are collected for all months of the reference period for a given wave, even if the interviewed person was in the sample for only part of the reference period. Data are also present even if the person was not interviewed. Information from the questionnaire is imputed when the person was in sample for at least 1 month of the reference period but not actually interviewed. That includes people who moved out of scope (as defined in Chapter 2), people who died, and people who refused to be interviewed. The entire questionnaire was imputed for Type Z noninterviews (people who refused to be interviewed, living in households where other members were successfully interviewed). Chapter 4 examines imputation procedures; Chapter 8 provides information on weighting. Data are collected for all months of the reference period even if the interviewed person was in the sample for only part of the reference period. The presence of a positive weight is also not a reliable guide to whether a person should be included in the planned analysis. Although people with zero weights will not enter into any weighted tabulations, they may provide important contextual information about people who do enter into those (weighted) tabulations. For example, a zero-weight person who is a member of the same household as a positive-weight person for only 3 months provides information about the positive-weighted person’s household (including, for example, household size, composition, income, and program participation) for that 3-month period. That is why records for these zero weighted people are retained in the SIPP full panel data files.12 Identifying Persons There are many occasions when a user may need to identify which records belong to each individual in the SIPP data files. That need arises, for example, during the following procedures: ! Merging data from topical module or full panel files to core wave files; ! Combining data from two or more core wave files; ! Linking husbands and wives; ! Linking parents and children; and 11 Beginning with the 1991 Panel, new missing wave imputation procedures were instituted for the longitudinal research files. Whenever data for a wave are imputed (the WAVFLG variable), PP-MIS is recoded to 1 on the longitudinal research files, indicating that the data for those months should be used. In some cases, these people will have records in the core wave files that were created during the Type Z imputation processing (see Chapter 4 for details). In some of these instances, however, the longitudinal research file will have data for people who are not present on the associated core wave data files . 12 - 12 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES ! Identifying which person received government transfer income on behalf of the family. To uniquely identify a person in the longitudinal research files, analysts should use the three variables shown in Table 12-5.13 Table 12-5. Variables Used to Uniquely Identify a Person in the Longitudinal Research Files Variable Name Description PP-ID Sample unit ID PP-ENTRY Entry address ID PP-PNUM Person number ! PP-ID uniquely identifies each initially sampled dwelling unit.14 Every person in the longitudinal research file was either a member of one of those units (an original sample member) or lived with someone during the life of the panel who was a member of an initially sampled dwelling unit. A person’s connection to that unit is an attribute of that person and does not change over time.15 This means that as people move from address to address, their PP-ID stays the same. As new people join the homes of original sample members, they receive the PP-ID of the original sample members. ! PP-ENTRY identifies the address where the person lived at the time he or she was first interviewed. It does not change even if the person moves.16 It is used in conjunction with the person number and the sample unit ID to uniquely identify persons within the sampling unit. Values for this variable are unique only within sample units. The entry address ID has two components. The first part of the ID number (two digits in the 1992 Panel, and one digit in all others) identifies the wave in which SIPP interviews were first conducted at the address. ! The second part of the number (one digit in all panels) sequentially numbers addresses within a sample unit (PP-ID) that enter the sample in the same wave. ! PP-PNUM uniquely identifies a person within the sample unit ID and entry address ID. PP-PNUM does not change even if the person moves. 17 The first part of PP-PNUM (two 12 Using the PP-MIS variable shown in Table 12-2, one can see that the first person within each rotation group was in sample every month of the panel. The second person shown in the table left the sample before the third interview (information was probably collected by proxy interview for that wave) and did not return to the sample. The eighth person left the sample in month 13. The tenth person entered the sample in month 38 (the last wave). 13 Beginning with the 1996 Panel, the entry address ID was no longer needed: person numbers are unique within sample units. Continued use of the entry address ID does not create any problems. It is simply redundant information. 14 The PP-ID is a random recode of three other variables in the Census Bureau’s internal (not public use) files: the respondent’s sampling area (PSU), the cluster of housing units within that area (called the “segment”), and a sequentially assigned serial number. Those three variables are omitted from the public use files to protect the confidentiality of the respondents. 15 There is one rare exception to this rule, which is described in the section entitled “Identifying Movers” later in this chapter. 16 See footnote 15. 17 See footnote 15. 12 - 13 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES digits in the 1992 Panel, and one digit in all others) indicates the wave in which the person was first interviewed.18 The remaining two digits are sequentially assigned within the household. Thus, original sample members are assigned person numbers ranging from 100 to 199. Individuals who enter the SIPP sample in Wave 2 are assigned a person number ranging from 200 to 299. Those who enter in Wave 10 are assigned person numbers ranging from 1001 to 1099. Table 12-6 illustrates how the combination of PP-ID, PP-ENTRY, and PP-PNUM uniquely identifies people and provides information about when they first entered the SIPP sample. In this example, there are eight individuals: five are original sample members; one person joined the sample in Wave 4, one person joined in Wave 7, and one person joined in Wave 10 (of the 1992 Panel). Table 12 - 6. How to Uniquely Identify a Person in the Longitudinal Research Files Sample Unit Entry Address Person Number ID (PP-ID) ID (PP-ENTRY) (PP-PNUM) Notes 123456789 11 101 Original sample member 123456789 11 102 Original sample member 123456789 11 401 Enters SIPP sample in Wave 4 123456789 71 701 Enters SIPP sample in Wave 7 321456789 11 101 Original sample member 321456789 11 102 Original sample member 321456789 11 103 Original sample member 456789123 101 1001 Enters SIPP sample in Wave 10 of the 1992 Panel Identifying Households The term household, as used in Census Bureau publications, refers to a group of people who occupy a housing unit. A house, an apartment or other group of rooms, or a single room is regarded as a housing unit if it is occupied or intended for occupancy as separate living quarters. That is, the occupants do not live and eat with any other people in the structure and there is direct access from the outside or through a common hall. A group of friends sharing an apartment constitutes a household. Rooming and boarding houses, college dormitories, convents, and monasteries are classified as group quarters rather than households. To uniquely identify a household or group quarters in the longitudinal research files in a given month, analysts should use the variables shown in Table 12-7. 19 18 For Wave 10 of the 1992 Panel and for the 1996 Panel, the first two digits of PNUM instead of the first digit identify the wave in which the person entered the sample. 19 Since household composition changes from one month to the next, it is generally not possible to construct longitudinal households. Users should not infer commonality across months based solely on place of residence in one month. The characteristics of the household to which a given person belongs (such as household size and household income) should be evaluated separately for each month, based on just those people who reside together in each specific month. Similar caution should be exercised when dealing with the characteristics of the family and, when applicable, the subfamily to which a person belongs. 12 - 14 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES Table 12 –7. Variables Used to Uniquely Identify a Household in the Longitudinal Research Files Variable Name Description PP-ID Sample Unit ID HH-ADID i Current Address ID in the ith month PP-MIS i Person’s interview status in the ith month People with the same PP-ID and HH-ADDID values and with a PP-MIS value of 1 live in the same household (or group quarters) in the ith month of the reference period. The eight individuals shown in Table 12-8 make up four households. The first household contains the first four individuals. The second household contains one person. The third household contains one person. The fourth household contains two people. This example depicts the households in the ith month. These people could belong to different households in other months. Users may find it helpful when reading the following pages to refer to Figure 2-1, which illustrates changes in household composition. Identifying Families The term family, as used in Census Bureau publications, refers to a group of two or more people related by birth, marriage, or adoption who reside together; all such individuals are considered members of one family.20 ! A primary family is a family containing the household reference person and all of his or her relatives. This means that a household composed of a husband and wife, their son, and their son’s wife (i.e., the daughter-in-law) is classified as a primary family containing four people. ! A related subfamily is a nuclear family that is related to but does not include the household reference person. For example, the son and his wife (i.e., the daughter-in-law) in the preceding example are a related subfamily. 20 As with households (see footnote 19), because family composition changes from one month to the next, it generally is not possible to construct longitudinal families. Users should not infer commonality across months based solely on family membership in one month. The characteristics of the family to which a person belongs (such as family size and family income) should be evaluated separately for each month, and should be based on just those people who reside together and are members of the same family in each specific month. Similar caution should be exercised when dealing with the characteristics of the household and, when applicable, the subfamily (related or unrelated) to which a person belongs. 12 - 15 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES Table 12-8. How to Uniquely Identify a Household or Group Quarters in a Given Month of the Longitudinal Research Files Sample Entry Person Person’s Unit ID Address ID Number Interview Address ID (PP-ID) (PP-ENTRY) (PNUM) Status (PP-MIS) (HH-ADDID) Notes 123456789 11 101 1 71 123456789 11 102 1 71 Four people in this household 123456789 11 401 1 71 123456789 71 701 1 71 321456789 11 101 1 31 One person in this household 321456789 11 102 1 32 One person in this household 321456789 11 103 1 101 321456789 101 1001 1 101 Two people in this householda a Because this example includes a person with an entry address of 101, we know that the example refers to a month from Wave 10 of the 1992 Panel (the only panel prior to 1996 with 10 or more waves). ! An unrelated subfamily (sometimes called a secondary family) is a nuclear family that is not related to the household reference person. Thus, a husband and wife who live in a friend’s house are classified as an unrelated subfamily. A mother and daughter who live in the mother’s boyfriend’s apartment are classified as an unrelated subfamily. ! A primary individual is a household reference person who lives alone or lives with only nonrelatives. Primary individuals are sometimes treated by the Census Bureau as families with only one person and are referred to as pseudo-families. ! A secondary individual is not a household reference person and is not related to any other people in the household. Secondary individuals are sometimes treated by the Census Bureau as families with only one person and are referred to as pseudo-families. Unlike the core wave files, the longitudinal research files do not contain family identification variables (e.g., FID, FID2, and SID). Analysts needing family identification variables must either merge them from the core wave files (Chapters 10 and 13) or create them.21 Because family composition can change over time, these are monthly variables. The algorithm in Figure 12-4 shows one approach to creating functional equivalents of the variables contained on the core wave files.22 The variables created by this algorithm are functionally equivalent to the variables with the same 21 In most cases, it is also possible to merge these variables from the core wave files. However, beginning with the 1991 Panel, a missing wave imputation procedure was applied to the longitudinal research files: data were imputed for people with missing data for a wave but with valid data for the two adjacent waves. Although these people have data in the longitudinal research file for imputed waves, some have no data in the core wave files (some of these people are subject to Type Z imputation procedures that create records in the core wave files). For these people, merging the family ID variables from the core wave files is not an option. 22 This algorithm uses the following (monthly) variables found on the longitudinal research files: FAMTYP and FAMNUM. These variables are discussed in greater detail in the next section. 12 - 16 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES names on the core wave files: they will group people into the same family and subfamily groups. However, the actual values assigned by this algorithm to these variables generally will not equal the values found in the variables from the core wave files. With these monthly variables (FID i, FID 2i, and SIDi), users can identify common family membership in each month.23 The Census Bureau has two principal methods for distinguishing families that are based on the variables and numbering schemes shown in Table 12-9. Analysts must remember to choose which type of family classification they want and then use the appropriate method. ! The first method defines a family as all persons who are related and living together. The family ID variable FIDi is used with this definition. FIDi groups the household reference person with all related household members by assigning them the same ID number. ! This family group corresponds to the Census Bureau’s definition of a primary family. FID groups members of each unrelated subfamily (and primary and secondary individuals) separately. ! The second method is similar to the first in defining a family, but the family excludes related subfamilies. The family ID variable FID2i is used with this definition. FID2 i equals zero for related subfamilies. Analysts who want to analyze multi generational families would use FID2i and the variable SIDi . SIDi treats related subfamilies as distinct family units by assigning them nonzero values. Analysts can easily distinguish unrelated subfamilies form other family units when they use these variables and numbering schemes. Table 12-10 illustrates the difference between FID, FID2, and SID for a single month. In the month shown, the first household contains a primary family of five people. The primary family contains two related subfamilies. FID and FID2 mask the fact that there are two related subfamilies; only SID provides that information. SID has nonzero values only for members of related subfamilies. The second household contains a primary family and two unrelated subfamilies. The third household contains a primary individual and an unrelated subfamily. The fourth household contains only a primary individual. The fifth household is group quarters containing two people. This example depicts those families in the i th month. These people could belong to different families in other months.24 23 See footnotes 19 and 20 24 See footnote 17 12 - 17 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES Figure 12- 4. Constructing Family and Subfamily ID Variables in the Longitudinal Research Files For each person (index=ip): For each month (index=mo): If PP-MIS(mo,ip)= 1 then do: If FAMTYP(mo,ip)=0 then FID(mo,ip)= 1 FID2(mo,ip)=1 SID(mo,ip)= 0 Else if FAMTYP(mo,ip)= 1 then FID(mo,ip)=10000 + ip FID2(mo,ip)=10000 + ip SID(mo,ip)=0 Else if FAMTYP(mo,ip)=2 then FID(mo,ip)=100 + FAMNUM(mo,ip) FID2(mo,ip)=100 + FAMNUM(mo,ip) SID(mo,ip)=0 Else if FAMTYP(mo,ip)=3 then FID(mo,ip)= 1 FID2(mo,ip)= 0 SID(mo, ip)= FAMNUM(mo, ip) Else if FAMTYP(mo,ip)= 4 then FID(mo,ip) = 10000 + ip FID2(mo,ip)= 10000 + ip SID(mo,ip) = 0 End if End "PP-MIS=1" Block End month loop End person loop Table 12-9. Variables Used to Identify Families in the Longitudinal Research Files Variable Name Description PP-ID Sample unit ID th HH-ADDID Address ID in the i month PP-MIS Person’s interview status in the ith month And one of the following created variables: FID i Family ID in the ith month FID2 i Family ID in the ith month, excluding related subfamily members (FID2i equals zero for related subfamily members) SID i Family ID in the ith month for related subfamily members (SIDi assigns nonzero values only to members of related subfamilies) FID2 i and SID i Family ID in the ith month, separating related subfamilies from the primary family Note: Variables FIDi, FID i, and SIDi are not included on the longitudinal research files. They can be created by using the algorithm shown in Figure 12-4 or merged from the core wave files. 12 - 18 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES The specific analysis being planned will inform the choice of which family classification to use. To group people into families in the same way that the Census Bureau does, analysts should use PP-ID, PP-MISi, HH-ADDID i, and FIDi. To analyze primary families excluding related subfamily members, analysts should include only those records with FID2i greater than zero. To analyze related subfamilies as distinct family units, analysts should use only those records with SIDi greater than zero. To uniquely identify (1) primary families excluding related subfamilies and (2) related subfamilies treated as distinct family groups, analysts should use PP-ID, PP-MISi , HH-ADDIDi, FID2i, and SID i. In those analyses, it is easy to distinguish unrelated families from other families. Variables Describing Household and Family Composition Table 12-11 shows the variables contained on the longitudinal research files summarizing household and family composition. 25 Table 12-11. Variables Used to Describe Household Composition in the Longitudinal Research Files Variable Name Description FAMTYPi Type of family in the ith month (e.g., primary family, related subfamily) FAMRELi Family relationship in the ith month (e.g., reference person, spouse of family reference person, child of family reference person) RRPi Recoded relationship to the household reference person in the ith month (e.g., household reference person living with relatives, child of household reference person) th ENTID-SP Entry address ID of spouse in the i month PNSPi Person number of spouse in the ith month th ENTID-PTi Entry address ID of parent in the i month th PNPTi Person number of parent in the i month U-PNGj Person number of guardian in the jth wave ENTID-GDj Entry address ID of guardian in the jth wave 25 More detailed information about the relationships between members is collected in the Household Relationships topical module. Those data provide extensive information about household composition at the time of the topical module interview. 12 - 19 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES As Table 12-12 shows, RRPi summarizes the relationship of each person to the household reference person in month i. Table 12-12. Relationship to the Household Reference Person in a Given Month Edited Relationship to the Description Household Reference Person (RRPi) 1 Household reference person, living with relatives 2 Household reference person, living alone or with nonrelatives 3 Spouse of household reference person 4 Child of household reference person 5 Other relative of household reference person 6 Nonrelative of household reference person, but related to other members of household 7 Nonrelative of all members of the household The household description depends on the identity of the reference person. For example, if Table 12-13, the household contains a mother, her daughter, and her daughter’s son. If the mother is the household reference person (RRPi =1) her daughter is listed as a child of the household reference person (RRPi=4) and the daughter’s son is listed as other relative of the household reference person (RRPi=5). If the daughter is the reference person, the son is listed as a child of the household reference person (RRPi=4) and her mother is listed as other relative of the household reference person (RRPi=5). Users should note that the household reference person can change from one month to the next; thus, the household description could also change. 12 - 20 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES Table 12-10. How to Uniquely Identify a Family in a Given Month of the Longitudinal Research Files Sample Unit Current Person’s Family ID, Family ID Subfamily Family Type Person ID Address Interview Including Excluding ID (FAMTYP) Number Notes (PP-ID) ID (HH- Status Subfamily Subfamily (SID) (PP- ADDID) (PP-MIS) (FID) (FID2) NUM) 110011111 11 1 1 1 0 0 101 This household contains 110011111 11 1 1 0 2 3 102 a primary family of five 110011111 11 1 1 0 2 3 103 people. The primary 110011111 11 1 1 0 3 3 104 family contains two 110011111 11 1 1 0 3 3 105 related subfamilies. 122210000 33 1 1 1 0 0 101 This household contains 122210000 33 1 1 1 0 0 104 a primary family and two unrelated 122210000 33 1 101 101 0 2 305 subfamilies. 122210000 33 1 101 101 0 2 306 122210000 33 1 102 102 0 2 307 122210000 33 1 102 102 0 2 308 555555555 21 1 1001 1001 0 4 101 This household contains 555555555 21 1 101 101 0 2 201 a primary individual and 555555555 21 1 101 101 0 2 202 an unrelated subfamily. 555555555 21 1 101 101 0 2 203 610000000 11 1 1001 1001 0 4 101 Primary individual. 897454644 11 1 1001 1001 0 1 101 Group quarters with two 897454644 11 1 1002 1002 0 1 102 secondary individuals. Notes: Variables FID i, FID2 i, and SIDi are not part of the longitudinal research files. They can be merged from the core wave files or created using the algorithm shown in Figure 12-4. FAMTYP = 0 means the person belongs to a primary family. FAMTYP = 1 means the person is a secondary individual. FAMTYP = 2 means the person belongs to an unrelated subfamily. FAMTYP = 3 means the person belongs to a related subfamily. FAMTYP = 4 means the person is a primary individual. 12 - 21 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES Table 12-13. Using RRP to Identify Households Containing Three Generations in the Longitudinal Research Files Household Reference Relationship to the Notes Person Household Reference Person (RRP i) Mother as Household Reference Person Mother 1 Reference Person Daughter 4 Child of reference person Daughter’s son 5 Other relative of reference person Daughter as Household Reference Person Daughter 1 Reference person Daughter’s son 4 Child or reference person Mother 5 Other relative of reference person Six other variables in the longitudinal research file can be used to describe household and family composition: PNSPi, ENTID-SP i, PNPTi, ENTID-PTi , U-PNGj , and ENTID-GD j. These six variables identify the person number and entry address ID of the spouse, parent, or guardian living at the same address as the person in the ith month or jth wave (in the last two cases).26 By building from these variables, the analyst can identify a variety of family configurations. For example, these variables can be used to identifyhouseholds containing three generations. Table 12-14 displays one household containing a mother and her two children. One child (PP-PNUM = 102) has a son, and the other child (PP-PNUM = 104) has a spouse. Using Family-Level Income Variables The longitudinal research files contain a number of family-level income variables. The family income variables on the longitudinal research files include the income of all related subfamily members. In other words, primary family members and related subfamily members are treated as one family by the Census Bureau when calculating family-level income amounts. The longitudinal research files do not contain any subfamily income variables. If family income variables are needed that do not pool related subfamilies with primary families, those income variables must be created. That is done by looping over persons with PP-MIS i of 1 and with common PP-ID, HH-ADDIDi, FID2i, and SID i for each month.27 Table 12-15 illustrates how the family income variables on the longitudinal research files include the 26 Parents and spouses always share the same sample unit ID (PP-ID) as the respondent. The variables are assigned values only in the months that people are living together. For example, a couple living together in Wave 1 would have values in the PNSP and ENTID-SP variables that pointed to each other. However, if they separate (and remain married) in Wave 2, the PNSP and ENTID-SP variables will be assigned values of 999 (indicating that the variables are not applicable). 27 FIDi and SIDi are not included on the longitudinal research files. They can be merged from the core wave files or created by using the algorithm shown in Figure 12-4. 12 - 22 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES income of related subfamily members. From the previous example of a primary family of five people, the primary family contains two related subfamilies. Total family income (FF-INC i) is $3,100. The incomes of all subfamily members are included in that amount. Table 12-14. Using PNSP and PNPT to Identify Households Containing Three Generations in the Longitudinal Research Files Household Entry Person Relationship Entry Member Address Number to Address Entry ID (PP- Household ID of Spouse Address ID Parent Notes (PP- NUM) Reference Spouse (PNSPi) (ENTID-PTi) (PNPTi) NTRY) Person (ENTID- (RRPi) SPi) Mother 11 101 1 11 999 11 999 Mother Daughter #1 11 102 4 11 999 11 101 Child Daughter #1's 11 103 5 11 999 11 102 Grandchild son Daughter #2 11 104 4 11 105 11 101 Child Spouse of 11 105 5 11 104 11 999 Spouse of Daughter #2 child Note: Value of 999 means not applicable. Table 12-15. Family Income in the Longitudinal Research Files Entry Person Person Current Family Sub- Total Person - Sample Address Number Interview Address ID Family Family Level Unit D (PP- (PP- Status (HH- Including ID Income Income (PP-ID) ENTRY) PNUM (PP-MIS i) ADDIDi) Subfamily (SIDi) (FF-INCi) Income (FIDi) (PP-INCi) 110011111 11 101 1 11 1 0 $3,100 $100 110011111 11 102 1 11 1 2 $3,100 $500 110011111 11 103 1 11 1 2 $3,100 $500 110011111 11 104 1 11 1 3 $3,100 $1,000 110011111 11 105 1 11 1 3 $3,100 $1,000 More About Using the SIPP ID Variables: Identifying Movers When a person moves, the current address field (HH-ADDIDi) changes. The PP-ID, PP-ENTRY, and PP-PNUM values remain the same. The first digit (or first two digits in the 1992 Panel) of HH-ADDIDi indicate(s) the wave in which a household is first interviewed at that new address. The remaining digits sequentially number the households that split into two or more households, as a result of a move to a different location by original sample members. Thus, new addresses in Wave 2 12 - 23 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES are numbered 21, 22, and so on. New addresses in Wave 3 are numbered 31, 32, and so on. New addresses in Wave 10 are numbered 101, 102, and so on. Refer to Figure 2-1, for illustrations of movement into and out of households. Table 12-16 shows that persons 101 and 102 in the first household are original sample members. Person 401 moved into the home of persons 101 and 102 in Wave 4. In Wave 7, all three moved to a new location and were joined by person 701. In the second household, person 101 is an original sample member who moved to a new location in Wave 3. In the third household, person 102 is an original sample member who used to live with persons 101 and 103 of the same sample unit ID (PP-ID), but moved to a new location in Wave 3 (to a different location from person 101). In the fourth household, person number 103 is an original sample member who used to live with persons 101 and 102 of the same sample unit ID number. Person 103 moved to a new location in Wave 10 and was joined by person 1001, who just entered the SIPP sample. All but two people moved from their original location (i.e., only two people have HH-ADDIDi equal to PP-ENTRY). 12 - 24 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES Table 12-16. How to Identify Movers in the Longitudinal Research Files Sample Entry Person Person Current Notes Wave Unit ID Address Number Interview Address ID (PP-ID) ID (PP- (PP-PNUM) Status (HH-ADDIDi) ENTRY) (PP-MISi) 1 123456789 11 101 1 11 Persons 101 and 102 are the original sample members 123456789 11 102 1 11 Person 401 begins to live with them 4 123456789 11 101 1 11 in Wave 4. 123456789 11 102 1 11 123456789 11 401 11 123456789 11 101 1 71 All three people move in Wave 7 7 123456789 11 102 1 71 and person 701 joins them 123456789 11 401 1 71 123456789 71 701 71 321456789 11 101 1 11 Person 101, person 102, and person 1 321456789 11 102 1 11 103 are original sample members. 321456789 11 103 1 11 321456789 11 101 1 31 Person 101 moved in Wave 3. 3 321456789 11 102 1 32 Person 102 moved in Wave 3 to a 321456789 11 103 1 31 different location from person 101. Person 103 remained with person 101. 321456789 11 101 1 31 Person 103 is an original sample 10 321456789 11 102 1 32 member who used to live with 321456789 11 103 1 101 persons 101 and 102 of the same 321456789 101 1001 1 101 ID. In Wave 10, person 103 lives in a new location with person 1001,who just entered the SIPP sample. The next example (Table 12-17) further illustrates how the ID system works as people move to new addresses, additional people move in with them, and households split. A review of Figure 2-1 may help in understanding the various household changes.  In Wave 1, there is a five-person household consisting of a husband, a wife, a daughter, a son, and a cousin. Because this is the first wave, the current address number is11, indicating address 1 of Wave 1, and the entry address number for each member of the household is the same as the current address number. Because they are assigned in Wave 1, the person numbers are in the 100 series and are numbered sequentially, beginning with 101.  During Wave 2, the son joins the Army, moves into military barracks, and therefore leaves the SIPP sample.28 The son’s record, person number 104, will contain information (either 28 Members of the armed forces are included in the SIPP sample only if they are living state-side in private housing. Those living overseas or in military barracks are not included in the SIPP sample universe. 12 - 25 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES imputed or provided by proxy) on his characteristics for the time in Wave 2 that he was still in the sample. If he does not return to the sample during the remainder of the panel, there will be no records for him beyond Wave 2.   During Wave 3, the daughter marries and her husband moves into the household. The current address number where the mother, father, cousin, daughter, and son-in-law live remains the same because it is the same address. The son-in-law’s entry address number is 11 because he first enters the SIPP sample at an address coded 11. The person number for the son-in-law is in the 300 series (301) because he joins the SIPP sample in Wave 3.   During Wave 4, the daughter and son-in-law move into a new house. Their current address number changes to 41 to indicate that a new address has been established in Wave 4. Meanwhile, the cousin, who is over age 15, moves in with an uncle.29 The cousin’s current address number changes to 42 (i.e., the second household added into the SIPP sample in the fourth wave). The assignment of address number 41 to the daughter and 42 to the cousin is random. It could be the other way around. The uncle enters the SIPP sample and receives an address number of 42 and an entry address number of 42. The uncle’s person number is in the 400 series (401) since he joins the survey in Wave 4.   No changes in household composition are observed during Waves 5-9.   During Wave 10, the daughter and son-in-law have a baby. This new sample member is assigned the sample unit ID of the daughter and son-in-law. The newborn’s entry address is 41, since that is the current address ID of the daughter and son-in-law at the time of birth. The newborn’s person number is 1001, reflecting the fact that the newborn came into the SIPP sample in Wave 10. Meanwhile, the cousin moves to Europe and therefore leaves the SIPP sample. The uncle, even though he did not move to Europe with the cousin, also leaves the SIPP sample because he no longer resides with an original SIPP sample member. Their records are no longer listed. 29 In the 1993 Panel, all original sample members were followed, no matter what their ages. In all other panels, only people 15 years of age or older were followed when they moved to new addresses. 12 - 26 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES Table 12-17. Another Example of Household Changes and Their Effects on the ID Variables in the Longitudinal Research Files Current Household Sample Unit ID Address ID Entry Address ID Person Number Member (PP-ID) (HH-ADDID) (PP-ENTRY) (PP-PNUM) Wave 1 Father 101111103 11 11 101 Mother 101111103 11 11 102 Daughter 101111103 11 11 103 Son 101111103 11 11 104 Cousin 101111103 11 11 105 Wave 2 Father 101111103 11 11 101 Mother 101111103 11 11 102 Daughter 101111103 11 11 103 Son 101111103 11 11 104 Cousin 101111103 11 11 105 Wave 3 Father 101111103 11 11 101 Mother 101111103 11 11 102 Daughter 101111103 11 11 103 Son-in-Law 101111103 11 11 301 Cousin 101111103 11 11 105 Wave 4 Parent’s Household Father 101111103 11 11 101 Mother 101111103 11 11 102 Daughter’s Household Daughter 101111103 41 11 103 Son-in-Law 101111103 41 11 301 Cousin’s Household Cousin 101111103 42 11 105 Uncle 101111103 42 42 401 Wave 10 Parent’s Household Father 101111103 11 11 101 Mother 101111103 11 11 102 Daughter’s Household Daughter 101111103 41 11 103 Son-in-Law 101111103 41 11 301 Newborn 101111103 41 41 1001 12 - 27 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES Table 12-18 displays this example again, but this table depicts how the HH-ADDID variable changes over time to reflect the household composition changes. The table also illustrates the structure of the full panel data files. There are two extremely rare occasions in which the original PP-ID, PP-ENTRY, and PP-PNUM values are modified: 1. The first occasion is when two separate sampling units, each containing original sample members, are merged, perhaps because of a marriage. In this situation, one of the original set of PP-ID and PP-ENTRY values is retained and the other set is changed to agree with the retained set. The person number values (PP-PNUM) of the changed set are modified further to be between 180 and 199, inclusive. 2. The second occasion is when a household splits into two new households (in which each new household gains a new sample person) and later the households recombine. For example, assume that a married couple separate in Wave 3, each moving in with a sibling. Both siblings are assigned a person number of 301, because they entered the sample in Wave 3 at different addresses (thus, HH-ADDIDi = 31 and 32). If the husband and wife reunite in Wave 6, and bring the siblings with them, one sibling’s person number would be changed. In this case, one of the siblings would have a person number of 301 and the other would have a person number of 680 (or some number between 680 and 699, inclusive). Because a record in the longitudinal research file describes the person throughout the entire panel and because the sample unit ID (PP-ID) cannot change on this record, each person in a merged household whose ID values were changed is assigned two full panel records. The first record contains the original ID information of the person before the merge and identifies the person as having exited the sample at the time of the merge. The second record contains the new ID information and identifies the person as having entered the sample at the time of the merge. There is no way to link the two records in the longitudinal research files. 30 30 If needed, this information can be merged from the core wave files. Chapters 10 and 13 provide details. 12 - 28 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES Table 12-18. Household Changes and Their Effects on the Household ID (HH-ADDIDi )Variable in the Longitudinal Research File HH-ADDIDi Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 PP- PP- Month Month Month Month Month PP-ID ENTRY Num Notes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 101111103 11 102 Mother 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 101111103 11 103 Daughter 11 11 11 11 11 11 11 11 11 11 11 11 41 41 41 41 41 41 41 41 101111103 11 104 Son 11 11 11 11 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 101111103 11 105 Cousin 11 11 11 11 11 11 11 11 11 11 11 11 11 42 42 42 42 42 42 42 101111103 11 301 Son/law 0 0 0 0 0 0 0 0 0 11 11 11 41 41 41 41 41 41 41 41 101111103 42 401 Uncle 0 0 0 0 0 0 0 0 0 0 0 0 42 42 42 42 42 42 42 42 101111103 41 1001 Newborn 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 HH-ADDIDi Wave 6 Wave 7 Wave 8 Wave 9 Wave 10 PP- PP- Month Month Month Month Month PP-ID ENTRY Num Notes 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 101111103 11 101 Father 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 101111103 11 102 Mother 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 101111103 11 103 Daughter 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 0 0 101111103 11 104 Son 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 101111103 11 105 Cousin 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 0 0 0 0 101111103 11 301 Son/law 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 101111103 42 401 Uncle 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 0 0 0 0 0 101111103 41 1001 Newborn 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 41 41 41 41 12 - 29 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES Identifying Program Units Besides household and family composition data, the longitudinal research files contain detailed information about participation in health insurance and various government transfer programs. For most programs, three characteristics are recorded (Table 12-19): 1. Whether the person is covered 2. Who received the income or benefit and 3. The amount of the income or benefit Table 12-19. Variables Describing Participation in Government Transfer Programs and Health Insurance Programs in the 1990-1993 Longitudinal Research Files Program Coverage Authorized GI Source Amount Recipient Code Social Security SOC-SEC SS-PIDX 1 Locate one of the amount Railroad Retirement RAILROAD RR-PIDX 2 variables: G1AMT1- Federal Supplemental - - 3 G1AMT10, using the Security Income corresponding source Veteran’s Benefits VETS VA-PIDX 8 variables: G1SRC1- Aid to Families with AFDC AFDCPIDX 20 G1SRC10 Dependent Children General Assistance GEN-ASST GA-PIDX 21 Foster Child Care FOST-KID FOSTPIDX 23 Other Welfare OTH-WELF OTH-PIDX 24 WIC Benefits WICCOV WIC-PIDX 25 Food Stamps FOODSTMP FS-PIDX 27 Medicare CARECOV - - Medicaid CAIDCOV - - CHAMPUS CHAMP - - The coverage variables identify whether the income or benefit covers that person in month i. In other words, when a person is flagged as covered by food stamps (FOODSTMPi = 1), the person either received the benefits directly (because he or she was the authorized food stamp recipient) or indirectly (because he or she was in the same program unit as the authorized recipient). The coverage variables also allow users to determine each person’s membership in each program unit. That is useful because program units often exclude some members of the family or household.31 Also, as with households and families, membership in program units can change from one month to the next. For that reason, program unit membership and characteristics of the unit should be evaluated for 31 In the 1984 and 1985 Panels, coverage for the Women, Infants, and Children (WIC) nutrition program was imputed to children under 6 years old if their mother reported participation in the WIC program. Beginning with the 1986 Panel, WIC coverage has been assessed directly for all sample members. 12 - 30 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES each month. The authorized recipient variables identify the people who actually received the income or benefit for the people in their program units. In the longitudinal research files, those variables do not use the entry address and person number values. Instead, they use the sequence number of the person within the sample unit (PP-RCSEQ) to identify authorized recipients. In other words, the authorized food stamp recipient is the person for whom FS-PIDXi in month i equals PP-RCSEQ. Individuals who are members of a common program unit in a given month (i) can be identified by using the sample unit ID (PP-ID), the person’s interview status in month i (PP-MISi ), and the authorized recipient variable in month i. For example, members of a common food stamp unit in month i are those with PP-MISi of 1 and common values of PP-ID (a value that does not change from month to month) and FS-PIDXi (a value that does change from one month to the next). The SIPP longitudinal research files do not include authorized recipient variables for Medicare and SSI programs.32 There are some exceptions to the rules: ! Social Security, Railroad Retirement, WIC, and AFDC can offer benefits solely to children. When that happens, an adult will receive the income on behalf of the children. The adult, therefore, is flagged as the authorized recipient and the income amounts appear on the record of the adult. The adult authorized recipient, however, is not flagged as being covered by the program. The children are flagged as covered. ! Most SSI recipients are elderly and disabled adults, but they can also be children with disabilities.33 Even so, the SSI amount is recorded on an adult’s record, not on the child’s record. Unlike the core wave files, the longitudinal research files have no coverage variable indicating whether or not the child, adult, or both, were covered. If needed, this information can be merged from the core wave files. Chapter 13 provides a detailed discussion of merging SIPP files. ! The medical insurance variables simply reflect who is enrolled in which type of program. There are no associated amount variables. These rules and exceptions are illustrated in Table 12-20. The household contains one AFDC unit and two food stamp units. The mother is covered by Social Security and SSI. The mother of the (disabled) child receives SSI on behalf of her child. The grandchild receives WIC. Everyone in the household is enrolled in Medicaid. The coverage variables are set to 2 whenever the person is not covered by the particular program. The indicators for the authorized recipients do not use the PP-ENTRY and PP-PNUM values. Instead, they are based on the “line number” of the authorized 32 In effect, each person covered by these two programs is an authorized recipient, and the program units are the people themselves. 33 In the 1990s, the definition of qualifying disabling conditions was expanded. That change in definition resulted in a rapid expansion of the child SSI caseload. 12 - 31 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES recipient on the household roster. That is very different from the indicators used on the core wave files. Table 12-20. Example of Program Units, Coverage, and Benefit Amounts in the Longitudinal Research Files Variable Mother Daughter #1 Daughter #1's Son Daughter #2 Spouse of Daughter #2 PP-PNUM 101 102 103 104 105 PP-RCSEQ 1 2 3 4 5 AGEi 70 21 4 25 26 AFDC AFDCi 2 1 1 2 2 AFDCPIDXi 0 2 2 0 0 Food Stamps FOODSTMPi 2 1 1 1 1 FS-PIDX i 0 2 2 4 4 SSI This only appears in the General Amounts (G1) section. WIC WICCOVi 2 2 1 2 2 WIC-PIDX i 0 2 2 0 0 Medicaid CAIDCOVi 1 1 1 1 1 Social Security SOC-SEC i 1 2 2 2 2 General (G1) Sources and Amounts G1SRC1 3 20 0 27 0 G1AMT1i ($) 188 123 0 130 0 G1SRC2 1 27 0 0 0 G1AMT2i ($) 470 160 0 0 0 G1SRC3 0 3 0 0 0 G1AMT3i ($) 0 122 0 0 0 G1SRC4 0 25 0 0 0 G1AMT4i ($) 0 30.12 0 0 0 a These codes are explained in the next section of text. Using the Unearned Income Variables To save space, the Census Bureau organizes the unearned income variables differently in the longitudinal research files than in the core wave files. As shown in Table 12-21, 10 variables on each person’s record identify up to 10 different sources of unearned income (G1SRC1-G1SRC10). For each source identified, there is a corresponding amount variable (G1AMT1 i-G1AMT10i ). Income amounts are recorded with monthly resolution. The person in Table 12-21 periodically receives $500 in federal SSI and $125 in food stamps. The person does not receive any other source of unearned 12 - 32 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES income. When using these fields, analysts often find it helpful to realign the unearned income into new income-specific variables.34 Income Topcoding The Census Bureau topcodes each income variable to protect against the possibility that a user might identify a SIPP respondent with very high income.35 While the data dictionary indicates a topcode of $33,332 for monthly income, that is also the income topcode for the wave. That topcode is, therefore, rarely used for a month. In most cases, the monthly income is topcoded at $8,333, which actually represents $8,333 or more. Individual amounts above $8,333 may occasionally be shown if the respondent’s income varied considerably from month to month within a wave. For example, if a respondent’s income from a single job was concentrated in only one of the four reference months, a figure as high as $33,332 could be shown. Summary income variables on the person, family, and household records are simply the sums of the component variables after they have been topcoded. The summary variables are not independently topcoded. Thus, a person with high income from several sources (multiple jobs, businesses, property) could have aggregate monthly income well over the topcode for each source, and yet the data could still be greatly understating the person’s true income. 34 For example, Table 12-20 includes monthly variables for SSI and food stamps that were created by using the algorithm in Figure 12-5. 35 New topcoding procedures were implemented with the 1996 Panel. 12 - 33 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES Table 12-21. Unearned Income in the Longitudinal Research Files PP-MIS Variable Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Month Month Month Month Month 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 PP-ID 7887 PP-PNUM 102 PP-MIS 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 0 0 0 0 G1SRC1 3 G1AMT1 ($) 500 500 500 500 0 0 0 0 500 500 500 500 0 0 0 0 0 0 0 0 G1SRC2 27 G1AMT2 ($) 0 0 0 0 0 0 125 125 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC3 0 G1AMT3 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC4 0 G1AMT4 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC5 0 G1AMT5 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC6 0 G1AMT6 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC7 0 G1AMT7 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC8 0 G1AMT8 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC9 0 G1AMT9 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC10 0 G1AMT10 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 (table continues) 12 - 34 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES Table 12-21. Unearned Income in the Longitudinal Research Files (continued) PP-MIS Wave 6 Wave 7 Wave 8 Wave 9 Wave 10 Variable Month Month Month Month Month 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 PP-ID 7887 PP-PNUM 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PP-MIS G1SRC1 3 G1AMT1 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC2 27 G1AMT2 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC3 0 G1AMT3 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC4 0 G1AMT4 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC5 0 G1AMT5 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC6 0 G1AMT6 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC7 0 G1AMT7 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC8 0 G1AMT8 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC9 0 G1AMT9 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC10 0 G1AMT10($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 - 35 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES Table 12-22. User-Created SSI and FSP Variables Using the Unearned Income Variables in the Longitudinal Research Files PP-MIS Variable Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Month Month Month Month Month 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 PP-ID 7887 PP-PNUM 102 PP-MIS 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 0 0 0 0 G1SRC1 3 G1AMT1 ($) 500 500 500 500 0 0 0 0 500 500 500 500 0 0 0 0 0 0 0 0 G1SRC2 27 G1AMT2 ($) 0 0 0 0 0 0 125 125 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC3 0 G1AMT3 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC4 0 G1AMT4 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC5 0 G1AMT5 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC6 0 G1AMT6 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC7 0 G1AMT7 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC8 0 G1AMT8 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC9 0 G1AMT9 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC10 0 G1AMT10($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 a SSI ($) 500 500 500 500 0 0 0 500 500 500 500 500 -99 -99 -99 -99 -99 -99 -99 -99 FSP ($) 0 0 0 0 0 0 0 125 125 125 125 0 -99 -99 -99 -99 -99 -99 -99 -99 a In SAS, the unassigned values would have a ‘system missing” value displayed as a “.”. (table continues) 12 - 36 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES Table 12-22. User-Created SSI and FSP Variables Using the Unearned Income Variables in the Longitudinal Research Files (continued) PP-MIS Wave 6 Wave 7 Wave 8 Wave 9 Wave 10 Variable Month Month Month Month Month 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 PP-ID 7887 PP-PNUM 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PP-MIS G1SRC1 3 G1AMT1 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC2 27 G1AMT2 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC3 0 G1AMT3 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC4 0 G1AMT4 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC5 0 G1AMT5 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC6 0 G1AMT6 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC7 0 G1AMT7 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC8 0 G1AMT8 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC9 0 G1AMT9 ($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G1SRC10 0 G1AMT10($) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SSI ($) -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 FSP ($) -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 12 - 37 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES Figure 12-5. Creating Monthly Food Stamp and SSI Income Variables from the Unearned Income Variables in the Longitudinal Research Files For each person: /* This step is not needed in SAS */ For each month(index=mo): If PP-MIS(mo)=1 then do SSI(mo)=0 FSP(mo)=0 End If PP-MIS(mo)=1 Else do SSI(mo)=-99 FSP(mo)=-99 End Else End month loop /* Begin here for SAS */ For each G1SRC(index=i): If G1SRC(i)=3 then do For each month (index=mo) If PP-MIS(mo)=1 then do SSI(mo)=G1AMT(i,mo) End If PP-MIS (mo)=1 End month loop End if G1SRC(i)=3 Else if G1SRC(i)=27 then do For each month (index=mo) If PP-MIS(mo)=1 then do FSP(mo)=G1AMT(i,mo) End If PP-MIS(mo)=1 End month loop End if G1SRC(i)=27 End G1SRC loop As shown in Table 12-23, person 101 has wages topcoded. The person received considerably more money in December than in the other months. Also, total family income and total household income are the sum of the income amounts (in this case, WS-ERN-AMT1 i + G1AMT1 i) after they have been topcoded. Table 12-23. Example of Topcoding in the Longitudinal Research Files Person Calendar Household Family Total Wages Child Support Number Month Total Income Income (WS-ERN- Payments (PP-PNUM) (HH-INC) (FF-INC) AMT1i) (G1AMT1i) 101 10 $9,333 $9,333 $8,333 $1,000 101 11 $9,333 $9,333 $8,333 $1,000 101 12 $13,123 $13,123 $12,123a $1,000 101 01 $5,793 $5,793 $4,543 $1,250 a This figure can exceed the nominal monthly topcode of $8,333 because the person's total earnings for the wave were below $33,332. 12 - 38 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES Using Allocation (Imputation) Flags As described in Chapter 4, the Census Bureau often imputes information when a person does not respond to the survey or to a particular question. Two sources identify whether information has been imputed: 1. Beginning with the 1991 Panel, all data for a wave are imputed if a person was not successfully interviewed in one wave but had complete information (from either a successful interview or a proxy interview) in the two adjacent waves. In those cases, the value of WAVFLG will be greater than zero and INTVW will be 3 or 4. 2. A variable of interest may be imputed. In the longitudinal research files, allocation (imputation) flags are included for the earned income, asset income, and unearned (transfer) income variables. Other variables are also subject to editing and imputation. The edit and imputation procedures used for the longitudinal research files differ from those used for the core wave files. The procedures used for the longitudinal research files make use of the full set of longitudinal data for a person. Because the core wave files are processed individually, the edit and imputation procedures applied to those files have, at most, 4 months of observations for a person. The procedures applied to the core wave files make greater use of cross-observation imputation methods than do those applied to the longitudinal research files.36 Using Weights The full panel longitudinal research files include the calendar year weights (FNLWGTs) and the full panel weight (PNLWGT). The number of calendar year weights depends on the duration of the panel; the number varies from one calendar year weight for the 1989 Panel to three calendar year weights for the 1993 Panel. When the 1996 full panel file is available, it will have four calendar year weights. The source and accuracy statements that accompany all SIPP full panel files ordered from the Census Bureau provide suggestions on how to use the weight variables in those files. Also, Chapter 8 of this Guide contains a full discussion of how to use weights in full panel files. 36 The edit and imputation procedures applied to the core wave files from the 1996 Panel make greater use of retrospective information than procedures used in earlier panels. See Chapters 4 and 10 for details. 12 - 39 SIPP USERS= GUIDE USING THE 1990-1993 FULL PANEL FILES Identifying States The longitudinal research file contains a variable (GEO-STE) that identifies 41 individual states and the District of Columbia; the nine other states are suppressed into three groups: 1. Maine, Vermont; 2. Iowa, North Dakota, South Dakota; and 3. Alaska, Idaho, Montana, Wyoming. Even though it is possible to identify most states, the SIPP sample, prior to the 2004 Panel, was not designed to be representative at the state level and should not be used to produce direct state-level estimates. The state variable is included on the public use files to allow examination of how state-level characteristics affect national estimates. For example, a user could apply the state-specific eligibility criteria for a means-tested program in order to arrive at a national estimate of the number of people eligible for the program. Because some states are not uniquely identified, some method of allocating the state-specific eligibility rules to sample persons in those states would need to be devised. The 2004 SIPP Panel can be used to produce state estimates. It was designed to produce reliable low income estimates for the 33 largest states. Identifying Metropolitan Areas The longitudinal research files do not contain any variables identifying metropolitan areas. Analysts who need this information should merge it from the core wave files. Chapter 11 provides details about how to use the variables identifying metropolitan areas. Chapter 13 Provides instructions for merging data from multiple SIPP public use files. 12 - 40