SURVEY OF INCOME
AND PROGRAM PARTICIPATION
       USERS’ GUIDE
  (Supplement to the Technical Documentation)


                 Third Edition
                Washington, D.C.
                     2001


                     Prepared by:

                       Westat
              1650 Research Boulevard
              Rockville, Maryland 20850

                  In association with:

         Mathematica Policy Research, Inc.
         600 Maryland Avenue, S.W., Suite 550
             Washington, D.C. 20024-2512

            Contract No. 50-YABC-7-66016


      U.S. DEPARTMENT OF COMMERCE
  ECONOMICS AND STATISTICS ADMINISTRATION
            U.S. CENSUS BUREAU
                                    Acknowledgments

The third edition of the Survey of Income and Program Participation (SIPP) Users' Guide was
prepared for the U.S. Census Bureau by Westat. Charles T. Nelson was the Government Project
Officer for the project within the Census Bureau, and Pat Doyle also provided invaluable support
and guidance to the effort. Many other staff from a number of divisions within the Census Bureau
shared their expertise and provided useful comments. In particular, we would like to thank Patrick
Benton, John Boies, Judith Hubbard Eargle, Donald Keathly, Karen Ellen King, Gordon Lester,
Stephen Mack, Mike McMahon, Thomas Palumbo, Donna Riccini, and Mahdi Sundukchi.

Chapters of the third edition were prepared by Louis Rizzo, Marianne Winglee, Alan Martinson,
and Ilene France of Westat; Larry Radbill of Mathematica Policy Research, Inc.; Julie Sykes
(then of Mathematica Policy Research, Inc.); and Elizabeth Sheley (Independent Consultant).
Alan Martinson, Marty Franklin, Laurie Tomasino, and Carol Dominique of Westat provided
editorial and production support; Julie Phillips (Independent Consultant) prepared the Index; and
Ana Horton of Westat designed the cover. Garrett Moran served as the Westat Project Director.

                                        **************

Because this edition of the Users' Guide builds on the previous editions, we also include the
following acknowledgments, which appeared in the second edition.

The first edition of the Survey of Income and Program Participation (SIPP) Users' Guide was
prepared by Daniel Kasprzyk (then Office of the Director), Pat Doyle (Mathematica Policy
Research, Inc.), Arnold Goldstein (Population Division), Patricia Kelly (Office of the Director),
and David B. McMillen (then Office of the Director).

The second edition was prepared by the Data Access and Use Staff of the Data User Services
Division. Geneva Burns coordinated the effort, assisted by Jackson Morton and J. Paul Wyatt.
Andrea Meier of the Survey of Income and Program Participation Branch in the Statistical
Methods Division prepared Chapter 8, "SIPP Cross-Sectional Weighting Procedures," under the
direction of Rajendra P. Singh. We would like to thank our colleagues within the Census Bureau
and our SIPP file users for their helpful comments.
                                              Contents
Chapter                                                                                                                     Page

   1      Introduction............................................................................................................1-1

               Evolution and History of SIPP...........................................................................1-1
               Uses of SIPP ......................................................................................................1-3
               The Survey.........................................................................................................1-4
               Nonsampling Errors, Sampling Errors, and Weighting .....................................1-6
               SIPP Public Use Files ........................................................................................1-7
               Comparison of SIPP with Other Surveys...........................................................1-9
               Guide to This Document..................................................................................1-11
               Where to Go for More Information .................................................................1-13

   2      SIPP Sample Design and Interview Procedures .................................................2-1

               Organizing Principles.........................................................................................2-1
               Sample Design ...................................................................................................2-5
               Following Rules .................................................................................................2-9
               Interview Procedures .......................................................................................2-16
               Nonresponse.....................................................................................................2-17

   3      Survey Content.......................................................................................................3-1

               The SIPP Interview ............................................................................................3-1
               Core Content ......................................................................................................3-2
               Topical Content..................................................................................................3-6

   4      Data Editing and Imputation................................................................................4-1

               Types of Missing Data .......................................................................................4-1
               Goals of Imputation ...........................................................................................4-2
               Assessing the Influence of Imputed Data on Analysis ......................................4-3
               An Overview of the Process ..............................................................................4-3
               Phase 1: Data Editing and Imputation Procedures for the Core Wave Files .....4-6
               Phase 2: Data Editing Procedures for the Full Panel Files ..............................4-15
               Confidentiality Procedures for the Public Use Files........................................4-17

   5      Finding SIPP Information.....................................................................................5-1

               Published Estimates from SIPP .........................................................................5-1
               SIPP Public Use Microdata Files.......................................................................5-1
               Sources for Obtaining SIPP Microdata............................................................5-12
               Other Sources of Information About SIPP ......................................................5-13


                                                                i
SIPP USERS’ GUIDE

Chapter                                                                                                                    Page

   6      Nonsampling Errors ..............................................................................................6-1

               Undercoverage ...................................................................................................6-1
               Nonresponse.......................................................................................................6-1
               Measurement Errors...........................................................................................6-2
               Effects of Nonsampling Error on Survey Estimates ..........................................6-3

   7      Sampling Error ......................................................................................................7-1

               Direct Variance Estimation................................................................................7-1
               Using GVFs to Approximate Variance Estimates .............................................7-4
               Variance Estimation with Imputed Data............................................................7-6

   8      Using Sampling Weights on SIPP Files................................................................8-1

               What Weights Are and Why They Should Be Used..........................................8-1
               Weights Available in SIPP Files........................................................................8-3
               Choosing a Weight.............................................................................................8-3
               How Weights Are Constructed ..........................................................................8-4
               Using Weights in the Core Wave Files..............................................................8-8
               Using Weights in the Topical Module Files ....................................................8-16
               Using Weights in the Full Panel File ...............................................................8-16
               Pooling Data from Two or Three Panels .........................................................8-19

   9      The SIPP Public Use Files .....................................................................................9-1

               Types of SIPP Data Files ...................................................................................9-1
               Understanding the ID Variables in SIPP ...........................................................9-2
               Identifying Persons and Their Relationships .....................................................9-4
               Working with Multiple Files..............................................................................9-9
               The Balance of Section II...................................................................................9-9

  10      Using the Core Wave Files ..................................................................................10-1

               Using the Technical Documentation of the Core Wave Files..........................10-2
               Relationship of the Core Wave Data Files to the SIPP Survey Instrument .....10-4
               Structure of the Core Wave Files.....................................................................10-6
               Identifying Persons ..........................................................................................10-6
               Identifying Households....................................................................................10-9
               Identifying Families .......................................................................................10-11
               Other Variables Describing Household and Family Composition ................10-15
               More About Using the SIPP ID Variables: Identifying Movers....................10-20
               Identifying Program Units .............................................................................10-26
               Income Topcoding in the 1996 Panel ............................................................10-29


                                                              ii
                                                                                                                 CONTENTS

Chapter                                                                                                                   Page

  10      Using the Core Wave Files (Cont.)

              Topcoding Prior to the 1996 Panel ................................................................10-35
              Using Allocation (Imputation) Flags .............................................................10-36
              Using Weights................................................................................................10-37
              Identifying States ...........................................................................................10-38
              Identifying Metropolitan Areas......................................................................10-39

  11      Using Topical Module Files.................................................................................11-1

              Using the Technical Documentation of the Topical Module Files ..................11-2
              Relationship of the Topical Module Data Files to the Survey Instrument ......11-6
              Structure of the Topical Module Files .............................................................11-7
              Reference Periods and Samples .......................................................................11-8
              Using a Person’s Monthly Interview Status Variables ....................................11-9
              Comparison of Variables in the Topical Module and Core Wave Files ........11-11
              Identifying People..........................................................................................11-13
              Identifying Families .......................................................................................11-16
              Other Variables Describing Household and Family Composition ................11-19
              More About Using the SIPP ID Variables: Identifying Movers....................11-21
              Topcoding ......................................................................................................11-27
              Using Allocation (Imputation) Flags .............................................................11-28
              Using Weights................................................................................................11-28
              Identifying States ...........................................................................................11-29
              Identifying Metropolitan Areas......................................................................11-29

  12      Using the 1990–1993 Full Panel Longitudinal Research Files .........................12-1

              Using the Technical Documentation of the 1990–1993
              Longitudinal Research Files ............................................................................12-2
              Relationship of the Longitudinal Research Data Files to the
              SIPP Survey Instrument...................................................................................12-5
              Structure of the Longitudinal Research Files...................................................12-6
              How to Align Data by Calendar Month...........................................................12-7
              Using the Monthly Interview Status (PP-MIS) Variables ...............................12-9
              Identifying Persons ........................................................................................12-13
              Identifying Households..................................................................................12-15
              Identifying Families .......................................................................................12-16
              Variables Describing Household and Family Composition...........................12-19
              Using Family-Level Income Variables..........................................................12-23
              More About Using the SIPP ID Variables: Identifying Movers....................12-23
              Identifying Program Units .............................................................................12-28
              Using the Unearned Income Variables ..........................................................12-30


                                                             iii
SIPP USERS’ GUIDE

Chapter                                                                                                                                 Page

     12         Using the 1990–1993 Full Panel Longitudinal Research Files (Cont.)

                     Income Topcoding .........................................................................................12-31
                     Using Allocation (Imputation) Flags .............................................................12-37
                     Using Weights................................................................................................12-37
                     Identifying States ...........................................................................................12-38
                     Identifying Metropolitan Areas......................................................................12-38

     13         Linking Core Wave, Topical Module, and Longitudinal Research Files .......13-1

                     Procedures for Linking Files............................................................................13-2
                     Nonmatches When Merging Files .................................................................13-15


Appendix

      A         SIPP Users’ Guide Variable Crosswalk: 1993 to 1996 ...................................... A-1

                     By 1993 Variable Name.................................................................................... A-2
                     By 1996 Variable Name.................................................................................. A-10
                     By 1993 File Position...................................................................................... A-17
                     By 1996 File Position...................................................................................... A-25

      B         SIPP Topcoding Specifications ............................................................................ B-1

                     Earnings ............................................................................................................ B-1
                     Year of Birth (TBYEAR).................................................................................. B-4
                     Age (TAGE)...................................................................................................... B-4
                     Age at Receipt of Social Security Disability Benefits (TAGESS) ................... B-5
                     Age Respondent Started Job or Business (TSJDATE, TEJDATE,
                     TSBDATE, TEBDATE) ................................................................................... B-5

      C         Computing the SIPP Sample Weights................................................................. C-1

                     Wave 1 Weights................................................................................................ C-1
                     Wave 2+ Weights............................................................................................ C-12
                     Calendar Year and Panel Weights .................................................................. C-17

      D         Acronyms ............................................................................................................... D-1

      E         Glossary ................................................................................................................. E-1

References ............................................................................................................................. R-1

Index           ...........................................................................................................................Index-1


                                                                        iv
                                                                                                          CONTENTS


                                                   Tables


Table                                                                                                             Page

  1-1   Comparison of SIPP, CPS, and PSID ....................................................................1-10
  2-1   Summary of the 1984–1996 SIPP Panels ................................................................2-2
  2-2   1996 Panel: Rotation Groups, Waves (W), and Reference Months ........................2-4
  2-3   Household Membership ...........................................................................................2-7
  2-4   Composition of the 1990 Panel................................................................................2-8
  2-5   Household Noninterview and Sample Loss Rates: 1990–1996 Panels .................2-19
  3-1   Types of Income Recorded in SIPP .........................................................................3-5
  3-2   Topical Modules Grouped Thematically .................................................................3-7
  5-1   Publications in the P-70 Series ................................................................................5-2
  5-2   Structure of the Person-Month Format Core Wave Files ........................................5-5
  5-3   Topical Modules, by Panel and Wave .....................................................................5-6
  5-4   Topical Modules, by Subject .................................................................................5-10
  5-5   Structure of Topical Module Microdata File .........................................................5-11
  5-6   Telephone Numbers for Information About Specific Aspects of SIPP .................5-16
  7-1   Variance Stratum Code and Variance Unit Code in SIPP Files, 1990–1993 ..........7-3
  8-1   Weighted and Unweighted Point-in-Time Estimates of Percentages
        Based on Core Wave 1 of the 1990 SIPP Panel for January 1990 ..........................8-2
  8-2   Weight Variables in SIPP Files for the 1996 and 1990–1993 Panels......................8-3
  8-3   Final Person Weights for Four Reference Months and One Interview Month
        in Wave 1 of the 1991 Panel ..................................................................................8-10
  8-4   Household, Reference Month, and Interview Month Weights for Members
        of a Household for a Given Month in Wave 1 of the 1990 Panel..........................8-11
  8-5   Family and Subfamily Reference Months Weights, by RHTYPE (HTYPE),
        EFTYPE (FTYPE), and ESFTYPE (STYPE) in Wave 1 of the 1990 Panel.........8-13
  8-6   Calendar Month Estimation: Using a Single Core Wave File in Wave 1
        of the 1991 and 1996 Panels ..................................................................................8-14
  8-7   Calendar Month Estimation: Using Two Core Wave Files from Waves 1
        and 2 of the 1991 and 1996 Panels ........................................................................8-15
  8-8   Calendar Year and Panel Weights, 1990–1993 .....................................................8-17
  8-9   Weighting Parameter Adjustment Factors for Both the Two-Panel and
        Three-Panel Combinations.....................................................................................8-21


                                                          v
SIPP USERS’ GUIDE

Table                                                                                                                  Page

  9-1   SIPP Variable Names, by File Type ........................................................................9-3
  9-2   Differences Among Core Wave, Topical Module, and Longitudinal Files
        (1990–1996 Panels) ...............................................................................................9-11
 10-1   Person-Month File Structure for the Core Wave Files ..........................................10-7
 10-2   Variables Used to Uniquely Identify a Person in the Core Wave Files.................10-8
 10-3   How to Uniquely Identify a Person in the Core Wave Files..................................10-9
 10-4   Variables Used to Uniquely Identify a Household or Group Quarters in the
        Core Wave Files...................................................................................................10-10
 10-5   How to Uniquely Identify a Household in the Core Wave Files .........................10-10
 10-6   Variables Used to Uniquely Identify a Family in the Core Wave Files ..............10-11
 10-7   Uniquely Identifying Families in the Core Wave Files .......................................10-13
 10-8   Variables Describing Household and Family Composition in the
        Core Wave Files...................................................................................................10-15
 10-9   The ERRP Variable in the 1996 Core Wave Files...............................................10-17
10-10   Comparison of RRP and RRPU Variables of the Core Wave Files
        Prior to the 1996 Panel.........................................................................................10-17
10-11   Identifying Households Containing Three Generations in the
        Core Wave Files...................................................................................................10-18
10-12   Identifying Households Containing Three Generations in the
        Core Wave Files...................................................................................................10-19
10-13   How the Family-Level Variables Include the Subfamily’s Information
        in the Core Wave Files.........................................................................................10-21
10-14   Identifying Movers in the Core Wave Files.........................................................10-22
10-15   Example of Household Changes and Their Effects on the ID Variables
        of the Core Wave Files ........................................................................................10-23
10-16   Variables Describing Participation in Government Transfer Programs
        and Health Insurance Programs in the Core Wave Files .....................................10-27
10-17   Example of Program Units, Coverage, and Recipiency in the
        Core Wave Files...................................................................................................10-30
10-18   Topcoding Criteria for the 1996 Panel.................................................................10-32
10-19   Topcode Amounts Used for Monthly Employment Income in Wave 1
        of the 1996 Panel .................................................................................................10-33
10-20   Example of Employment Income Topcoding in the 1996 Panel .........................10-35
10-21   Example of Topcoding in the Core Wave Files Prior to the 1996 Panel:
        Single Person Household .....................................................................................10-36


                                                           vi
                                                                                                              CONTENTS

Table                                                                                                                  Page

10-22   Weight Variables in SIPP Core Wave Files for the 1996 and
        1990–1993 Panels ................................................................................................10-38
 11-1   Example of the Topical Module File Structure......................................................11-7
 11-2   Monthly Interview Status Variables in the 1984–1993 SIPP Panels...................11-10
 11-3   Interview Month and Reference Months for Each Rotation Group in
        Wave 4 of the 1993 Panel ....................................................................................11-10
 11-4   Variables Common to the Core Wave and Topical Module Files from
        Wave 1 of the 1996 Panel ....................................................................................11-12
 11-5   Examples of Same Variables with Different Names in the Core Wave
        and Topical Module Files Prior to the 1996 Panel ..............................................11-12
 11-6   Variables Used to Uniquely Identify a Person in the Topical Module Files .......11-13
 11-7   How to Uniquely Identify a Person in the Topical Module Files ........................11-15
 11-8   Variables Used to Uniquely Identify a Household or Group Quarters
        in the Topical Module Files .................................................................................11-15
 11-9   How to Uniquely Identify a Household in the Topical Module Files..................11-16
11-10   Variables Used to Uniquely Identify a Family in the Topical Module Files
        for the 1996 Panel ................................................................................................11-17
11-11   Uniquely Identifying Families in the Topical Module Files in the 1996 Panel...11-18
11-12   Household and Family Composition Variables in the Topical Module Files......11-19
11-13   Relationship to the Household Reference Person in the Topical Module Files ..11-20
11-14   ERRP (RRP) Coding for the Same Three-Generation Household When
        Two Different People Are Designated as the Reference Person in the
        Topical Module Files ...........................................................................................11-21
11-15   Identifying Households Containing Three Generations in the
        Topical Module Files ...........................................................................................11-22
11-16   Identifying Movers in the Core Wave Files.........................................................11-23
11-17   Example of Household Changes and Their Effects on the ID Variables
        in the Core Wave Files.........................................................................................11-25
 12-1   Summary of Panels, Waves, Reference Months, and Sample Sizes......................12-7
 12-2   Example of the Longitudinal Research File Structure...........................................12-8
 12-3   Reference Periods for Each Rotation Group of the 1992 Panel.............................12-9
 12-4   Monthly Data from the 1992 Panel, Realigned by Calendar Month ...................12-11
 12-5   Variables Used to Uniquely Identify a Person in the
        Longitudinal Research Files ................................................................................12-14


                                                           vii
SIPP USERS’ GUIDE

Table                                                                                                           Page

 12-6   How to Uniquely Identify a Person in the Longitudinal Research Files .............12-15
 12-7   Variables Used to Uniquely Identify a Household in the
        Longitudinal Research Files ................................................................................12-15
 12-8   How to Uniquely Identify a Household or Group Quarters in a Given
        Month of the Longitudinal Research Files...........................................................12-16
 12-9   Variables Used to Identify Families in the Longitudinal Research Files ............12-18
12-10   How to Uniquely Identify a Family in a Given Month of the
        Longitudinal Research Files ................................................................................12-20
12-11   Variables Used to Describe Household Composition in the
        Longitudinal Research Files ................................................................................12-21
12-12   Relationship to the Household Reference Person in a Given Month...................12-21
12-13   Using RRP to Identify Households Containing Three Generations
        in the Longitudinal Research Files ......................................................................12-22
12-14   Using PNSP and PNPT to Identify Households Containing
        Three Generations in the Longitudinal Research Files........................................12-22
12-15   Family Income in the Longitudinal Research Files .............................................12-23
12-16   How to Identify Movers in the Longitudinal Research Files...............................12-24
12-17   Another Example of Household Changes and Their Effects on the
        ID Variables in the Longitudinal Research Files.................................................12-25
12-18   Household Changes and Their Effects on the Household ID (HH-ADDIDi)
        Variable in the Longitudinal Research File .........................................................12-27
12-19   Variables Describing Participation in Government Transfer Programs and
        Health Insurance Programs in the 1990–1993 Longitudinal Research Files.......12-29
12-20   Example of Program Units, Coverage, and Benefit Amounts in the
        Longitudinal Research Files ................................................................................12-31
12-21   Unearned Income in the Longitudinal Research Files.........................................12-32
12-22   User-Created SSI and FSP Variables Using the Unearned Income Variables
        in the Longitudinal Research Files ......................................................................12-34
12-23   Example of Topcoding in the Longitudinal Research Files.................................12-37
 13-1   Example of the Core Wave Person-Month File Structure .....................................13-7
 13-2   Example of the Core-Wave Wide-Record/Person File Structure
        (After Applying the Program in Figure 13-1 to the Data in 13-1).........................13-7
 13-3   Variables Identifying People in the Core Wave and Longitudinal
        Research Files for Panels Prior to 1996.................................................................13-9


                                                        viii
                                                                                                                 CONTENTS

Table                                                                                                                     Page

 13-4    Variables Identifying People in the Topical Module and Core Wave Files
         for Panels Prior to 1996 .......................................................................................13-14
 13-5    Variables Identifying People in the Topical Module and
         Longitudinal Research Files Prior to the 1996 Panel...........................................13-15
 13-6    Reasons for Nonmatches......................................................................................13-17
 B-1     Examples of Income Amounts That Need to Be Topcoded ................................... B-2
 B-2     Earnings Topcodes.................................................................................................. B-4
 B-3     1996 Panel Topcoding Specifications..................................................................... B-6
 C-1     Major Groupings of Later Wave Noninterview Cells........................................... C-19
 C-2     Major Groupings of Calendar Year (Panel) Noninterview Cells.......................... C-21


                                                      Figures


Figure                                                                                                                    Page

  2-1    Following Rules .....................................................................................................2-10
  3-1    Skip Pattern Example...............................................................................................3-2
  4-1    Sequence of Cross-Sectional Imputation and Longitudinal Editing Procedures .....4-4
 10-1    Excerpt from a Data Dictionary for the Core Wave Files .....................................10-3
 10-2    Corresponding SAS and FORTRAN Syntax to Read the Data from the
         Core Wave Files.....................................................................................................10-5
 11-1    Excerpt from the Data Dictionary for the Topical Module Files...........................11-3
 11-2    Corresponding SAS and FORTRAN Syntax to Read Data from
         Topical Module Files .............................................................................................11-5
 12-1    Excerpt from the 1993 Longitudinal Research File Data Dictionary ....................12-4
 12-2    Corresponding SAS and FORTRAN Syntax to Read in Data from the 1993
         Longitudinal Research File Data Dictionary .........................................................12-5
 12-3    Algorithm for Realigning SIPP Panel Month to Calendar Months
         in the 1992 Panel..................................................................................................12-10
 12-4    Constructing Family and Subfamily ID Variables in the Longitudinal
         Research Files ......................................................................................................12-18
 12-5    Creating Monthly Food Stamp and SSI Income Variables from the
         Unearned Income Variables in the Longitudinal Research Files.........................12-36


                                                             ix
SIPP USERS’ GUIDE

Figure                                                                                                               Page

 13-1    Sample SAS Code to Change the Core Wave Files from Person-Month Format
         to Person-Record Format from Wave 2 of the 1996 Panel....................................13-5
 13-2    Sample SAS Code to Change the Longitudinal Research Files from
         Person-Record Format to Person-Month Format for Panels Prior to 1996 .........13-10
 13-3    Data Dictionary Entries for Variables Identifying the Reason a Person
         Left the SIPP Sample ...........................................................................................13-19
 C-1     Second-Stage Cells for Hispanics........................................................................... C-6
 C-2     Second-Stage Cells for Non-Hispanic Children ..................................................... C-7
 C-3     Second-Stage Cells for Non-Hispanic Adults......................................................... C-8
 C-4     Calendar Year and Panel Weight Second-Stage Cells for Hispanics ................... C-23
 C-5     Calendar Year and Panel Weight Second-Stage Cells for
         Non-Hispanic Children ......................................................................................... C-23
 C-6     Calendar Year and Panel Weight Second-Stage Cells for
         Non-Hispanic Adults ............................................................................................ C-24


                                                           x
Section I
1. Introduction
This guide is intended as a reference for analysts who need information about using the Survey
of Income and Program Participation (SIPP). The main objective of SIPP is to provide accurate
and comprehensive information about the income and program participation of individuals and
households in the United States, and about the principal determinants of income and program
participation. SIPP offers detailed information on cash and noncash income on a subannual basis.
The survey also collects data on taxes, assets, liabilities, and participation in government transfer
programs. SIPP data allow the government to evaluate the effectiveness of federal, state, and
local programs.

This chapter and the ones that follow come under two main sections. Section I encompasses
discussions of survey design and content, data editing and imputation procedures, sampling and
nonsampling error, and weighting. Section II provides information about working with each of
the three types of SIPP microdata files (the core wave files, topical module files, and full panel
files), as well as instructions for linking SIPP files. This introduction offers a brief overview of
each of those topics.


Evolution and History of SIPP
Until the advent of SIPP, the major source of data on income and program participation was the
Current Population Survey (CPS) March Income Supplement. The CPS continues to be the
source of all official income and poverty statistics published by the Census Bureau. The CPS,
however, is designed primarily to obtain information on employment. Because income
measurement was never the primary purpose of the CPS, it has certain gaps in this area. For
example, CPS respondents are asked in March to recall their income during the preceding
calendar year. Many respondents have difficulty in remembering sources such as property
income or irregular income over the yearlong reference period. Also, the CPS does not capture
the impact of changes in household composition during the year, nor does the survey explicitly
measure periods of program participation. Further, the CPS does not collect data on assets and
liabilities, which are needed to measure more completely a household’s economic status and
eligibility for program benefits. To add those items to the CPS questionnaire would dilute the
main purpose of that survey and unduly increase respondent burden. Finally, the CPS is designed
to be a cross-sectional survey. During the 1970s, the increasing size of government programs and
their interactions with the labor market led to a need for longitudinal data.

To address those data issues, the Department of Health, Education, and Welfare (HEW) initiated
the Income Survey Development Program (ISDP) in the late 1970s. In developing ISDP content
and procedures, HEW focused on questionnaire length, length of reference period, and linkage of
survey data to program records. The 1979 ISDP Panel was a longitudinal survey in which
respondents were asked about their income, labor force participation, and other characteristics;


                                                1-1
SIPP USERS’ GUIDE

repondents were recontacted every 3 months to supply information on themselves and others
with whom they resided; the 3-month span was the reference period for the interview.


The First SIPP Panels

The lessons learned from ISDP were incorporated into the initial design of SIPP, which was used
for the first 10 years of the survey. The original design of SIPP called for a nationally
representative sample of individuals 15 years of age and older to be selected in households in the
civilian noninstitutionalized population. Those individuals, along with others who subsequently
lived with them, were to be interviewed once every 4 months over a 32-month period. To ease
field procedures and spread the work evenly over the 4-month reference period for the
interviewers, the Census Bureau randomly divided each panel into four rotation groups. Each
rotation group was interviewed in a separate month. Four rotation groups thus constituted one
cycle, called a wave, of interviewing for the entire panel (Chapter 2). At each interview,
respondents were asked to provide information covering the 4 months since the previous
interview. The 4-month span was the reference period for the interview. The first sample, the
1984 Panel, began interviews in October 1983 with sample members in 19,878 households. The
second sample, the 1985 Panel, began in February 1985. Subsequent panels began in February of
each calendar year, resulting in concurrent administration of the survey in multiple panels.

The original goal was to have each panel cover eight waves. However, a number of panels were
terminated early (Chapter 2) because of insufficient funding. For example, the 1988 Panel had
six waves; the 1989 Panel, part of which was folded into the 1990 Panel, was halted after three
waves. In addition, the intent was for each SIPP panel to have an initial sample size of 20,000
households. That target was rarely achieved; again, budget issues were usually the reason.

The 1996 redesign (discussed below) entailed a number of important changes. First, the 1996
Panel spans 4 years and encompasses 12 waves. The redesign has abandoned the overlapping
panel structure of the earlier SIPP, but sample size has been substantially increased: the 1996
Panel had an initial sample size of 40,188 households (Chapter 2).


The 1996 Redesign

In 1990, the Census Bureau asked the Committee on National Statistics (CNSTAT) at the
National Research Council to undertake a comprehensive review of SIPP. The resulting report,
The Future of the Survey of Income and Program Participation (Citro and Kalton, 1993),
summarizes the first 9 years of SIPP and provides recommendations for the future of the survey.
Some of those recommendations were implemented with the 1996 SIPP Panel in what is known
as the 1996 redesign.

One of the goals of the 1996 redesign was to improve the quality of longitudinal estimates in
order to provide better information for policy makers. Specific changes include the following:


                                               1-2
                                                                                   INTRODUCTION

!   A larger initial sample than in previous panels, with a target of 37,000 households;
!   A single 4-year panel instead of overlapping 32-month panels;
!   Twelve or 13 waves instead of 8;
!   The introduction of computer-assisted interviewing (CAI), which, among other
    improvements, permits automatic consistency checks of reported data during the interview;
    those checks can reduce the level of postcollection edits and imputation and thus help to
    maintain longitudinal consistency; and
!   Oversampling of households from areas with high poverty concentrations.
The first interviews of the redesigned SIPP began in April 1996 with the 1996 Panel. Later in
1996, Congress passed the Personal Responsibility and Work Opportunity Reconciliation Act
(PRWORA). That law significantly altered the nature of public transfer programs, shifting more
responsibility to state governments, establishing new eligibility rules for a number of programs,
and setting limits on recipiency. The existing welfare program, Aid to Families with Dependent
Children (AFDC), was replaced with a new program, Temporary Assistance for Needy Families
(TANF). Those changes came after interviewing for the 1996 Panel had already begun with a
questionnaire designed for the array of transfer programs that existed before PRWORA was
enacted. To accommodate program changes brought about by PRWORA, the Census Bureau
began adapting transfer-program questions to reflect the current situation.


Uses of SIPP
SIPP produces national-level estimates for the U.S. resident population and subgroups. Although
the SIPP design allows for both longitudinal and cross-sectional data analysis, SIPP is meant
primarily to support longitudinal studies. SIPP’s longitudinal features allow the analysis of
selected dynamic characteristics of the population, such as changes in income, eligibility for and
participation in transfer programs, household and family composition, labor force behavior, and
other associated events.

One of the most important reasons for conducting SIPP is to gather detailed information on
participation in transfer programs. Data from SIPP allow analysts to examine concurrent
participation in multiple programs. SIPP data can also be used to address the following types of
questions:

!   How have changes in eligibility rules or benefit levels affected recipients?
!   How have changes in the eligibility rules affected the program target population, that is,
    those eligible to receive benefits?
!   How does income from other household members affect labor force participation and reasons
    for not working?
!   How do wealth and income patterns differ for various age, gender, and racial groups?


                                                1-3
SIPP USERS’ GUIDE

Because SIPP is a longitudinal survey, capturing changes in household and family composition
over a multiyear period, it can also be used to address the following questions:

!   What factors affect change in household and family structure and living arrangements?
!   What are the interactions between changes in the structure of households and families and the
    distribution of income?
!   What effects do changes in household composition have on economic status and program
    eligibility?
!   What are the primary determinants of turnover in programs such as Food Stamps?


The Survey
SIPP data show sample members’ lives at discrete points in time, as well as a history of changes
in their economic circumstances and household relationships. Understanding survey design,
content, and procedures is key for analysts wishing to use SIPP data.


Design of SIPP

The adults followed in each SIPP panel come from a nationally representative sample of
households in the civilian noninstitutionalized U.S. population. People selected into the SIPP
sample are interviewed once every 4 months over the life of the panel. If original sample
members 15 years of age or older move from their original addresses to other addresses, they are
interviewed at the new addresses. The survey sample includes children residing with original
sample members. If, after the first interview, other people not previously in the survey become
part of a respondent’s household, the new people are interviewed as long as they continue living
with respondents from the first interview (Chapter 2).


SIPP Contents

Information collected in SIPP falls into two categories: core and topical. The core content
includes questions asked at every interview and covers demographic characteristics; labor force
participation; program participation; amounts and types of earned and unearned income received,
including transfer payments; noncash benefits from various programs; asset ownership; and
private health insurance. Most core data are measured on a monthly basis, although a few core
items are measured only as of the interview date, once every 4 months.

Other questions produce in-depth information on specific subjects and are asked less frequently.
Those topical questions are often found in topical modules that usually follow the core content.
Topical questions probe in greater detail about particular social and economic characteristics and


                                               1-4
                                                                                               INTRODUCTION

personal histories. Included are such topics as assets and liabilities, school enrollment, marital
history, fertility, migration, disability, and work history. Topical module questions typically
collect information on events in the past or characteristics that tend to change slowly, if at all.


Data Editing and Imputation

Computer-assisted interviewing (CAI) allows some data editing to occur while the interview is in
progress because the system detects inconsistencies and prompts the interviewer to ask the
respondent for additional information. CAI also allows use of prior wave data for editing missing
data from later waves, thus lessening the need for subsequent longitudinal editing. However,
editing and imputation still occur after SIPP interviews are completed (Chapter 4). The Census
Bureau edits data for consistency, imputes missing data, and creates internal data files and public
use files for each wave.

After each panel is concluded, the Census Bureau creates a full panel file by stripping all edited
and imputed values from the core data, linking those data, and then applying a different set of
longitudinally consistent edit and imputation procedures to the resulting file. As part of that
process, some data are recoded to maintain respondent confidentiality.

The Census Bureau uses several imputation procedures. Most common is some version of a
sequential hot deck, in which SIPP statisticians impute missing data by searching for a “donor”
respondent who is similar to the respondent with the missing data. The donor’s answers are used
in the assignment of missing data to the original respondent’s record. Specific imputation
procedures are discussed in Chapter 4. Data editing is still preferable to imputation and is used
whenever a missing item can be logically inferred from other information that has been provided.


Accessing SIPP Information

Most analysts will find the published estimates from SIPP data useful. Census Bureau
publications may provide required estimates, saving users the need to generate those estimates
themselves. Published estimates can also provide a crosscheck for estimates prepared by analysts
from the microdata files.1

The Census Bureau makes published estimates from SIPP data available from several sources
(Chapter 5). All public use microdata files are available on magnetic media or CD-ROM, along
with a full set of documentation, directly from the Census Bureau. The Inter-university
Consortium for Political and Social Research (ICPSR) also provides access to SIPP microdata

1
  Prior to the 1996 Panel, the Census Bureau estimates were usually impossible to replicate exactly because they
were based on internal data files that had not yet been topcoded and otherwise edited to protect the confidentiality of
respondents. Although new topcoding procedures are being implemented with the 1996 and subsequent panels, to
facilitate the production of comparable estimates, exact replication of some Census Bureau estimates will still be
impossible.


                                                         1-5
SIPP USERS’ GUIDE

for member institutions. In addition, the SIPP data and documentation that the Census Bureau
releases are not copyrighted and thus can be shared, although users are cautioned that this
provision applies only to materials written and distributed directly by federal agencies. Finally,
analysts conducting exploratory work might wish to investigate the Census Bureau’s on-line
resources. SIPP microdata are available through two access tools—Surveys-on-Call and
FERRET (Chapter 5). The home sites of both online tools can be accessed at the SIPP Web site
(http://www.sipp.census.gov/sipp).


Nonsampling Errors, Sampling Errors, and
Weighting
The SIPP Quality Profile, 3rd Ed. (U.S. Census Bureau, 1998a), offers an in-depth discussion of
the sources and magnitude of errors in SIPP-based estimates. Although it addresses both
sampling and nonsampling errors, it emphasizes the latter. This Users’ Guide provides a
summary chapter addressing nonsampling errors (Chapter 6), a chapter on sampling errors
(Chapter 7), and a chapter on the use of weights (Chapter 8). In addition, Appendix C addresses
weighting in detail.


Nonsampling Errors

All surveys—including SIPP—are subject to nonsampling errors from various sources. SIPP
contains nonsampling errors common to most surveys, as well as errors that stem from SIPP’s
longitudinal design. Undercoverage in household surveys is due primarily to within-household
omissions; the omission of entire households is less frequent. SIPP experiences some differential
undercoverage of demographic subgroups; for example, the coverage ratio of black males over
15 years of age is much lower than that for white males in the same age group. To compensate
for this differential undercoverage, the Census Bureau adjusts SIPP sample weights to population
control totals. Little is known, however, about how effective those adjustments are in reducing
biases.

Sample attrition is another major concern in SIPP because of the need to follow the same people
over time. Attrition reduces the available sample size. To the extent that those leaving the sample
are systematically different from those who remain in the sample, survey estimates could be
biased.

Response errors in SIPP take on a number of forms. Recall errors are thought to be the source of
the “seam phenomenon.” This effect results from the respondent’s tendency to project current
circumstances back onto each of the 4 prior months that constitute the SIPP reference period.
When that happens, any changes in respondent circumstances that occurred during that 4-month
period appear to have happened in the first month of the reference period. A disproportionate


                                               1-6
                                                                                           INTRODUCTION

number of changes appear to occur between the fourth month of one wave and the first month of
the following wave, which is the “seam” between the two waves—hence the name.

Another potential source of response error is the time-in-sample effect. This effect refers to the
tendency of sample members to “learn the survey” over time. The more times a sample member
is interviewed, the better he or she learns the questionnaire. The concern is that sample members
will alter their responses to the survey questions in an effort to conceal sensitive information or
to minimize the length of the interview.


Sampling Errors

A common mistake in the estimation of sampling errors for survey estimates is to ignore the
complex survey design and treat the sample as a simple random sample (SRS) of the population.
This mistake occurs because most standard software packages for data analyses assume simple
random sampling for variance estimation. When applied to SIPP estimates, SRS formulas for
variances typically underestimate the true variances. Chapter 7 describes how to obtain
appropriate variance estimates that take into account SIPP’s complex sample design.


Weighting

SIPP data analysts should understand the importance of using weights. The weight for a
responding unit in a survey data set is an estimate of the number of units in the target population
that the responding unit represents. In general, because population units may be sampled with
different selection probabilities, and because response and coverage rates may vary across
subpopulations, different responding units represent different numbers of units in the
population.2

The combined effects of differential response, differential coverage, and differential attrition
mean that unweighted analyses can produce biased results. Each SIPP file contains several
alternative sets of weights that address the variety of units of analysis (such as persons,
households, families, and subfamilies) and time periods for which survey estimates may be
needed. It is important to understand the different weights on the files and to use those that are
appropriate for a particular analysis.

The selection and use of weights in SIPP analyses are discussed in Chapter 8 and Appendix C.


2
 Most SIPP panels have not sampled different subpopulations at different rates. There are two exceptions: the 1990
and 1996 Panels. Chapter 2 discusses the oversamples included in each of those panels.


                                                      1-7
SIPP USERS’ GUIDE


SIPP Public Use Files
There are three types of SIPP microdata files available for public use: core wave files, topical
module files, and full panel files. Although content overlaps among these files, each is designed
to facilitate a different kind of analysis.


Core Wave Files

SIPP core wave files contain the core labor force, income, household and family composition,
and program participation data from one wave of interviews. Since the 1990 Panel, these files
have been issued in a person-month format, with up to four records for each sample member.
Each record contains data from one of the four reference months covered by the wave.3


Topical Module Files

Each topical module file contains all of the topical module subject areas that were administered
during the wave in question. The files contain one record for each person who was a sample
member at the time of the interview. When critical demographic and weight variables are
included, the topical module files can be used independently from the core wave and full panel
files. However, because topical module files contain only a small subset of the core items, users
often need to merge data from either the core wave or the full panel files.


Full Panel Files

Full panel files are released after interviewing for a panel is completed. They contain one record
for each original sample member, all children, and all adults who entered the sample after Wave
1. People who were not interviewed for 1 or more months over the course of the panel either
have their data imputed or are identified as not in the sample, although their records remain in
the file. Variables within each record correspond to the information that was collected in the core
content sections of the interviews. Different variables occur with different frequency, depending
upon how often certain questions were asked. For example, because a sample member’s sex, date
of birth, and race are unlikely to change, the variables corresponding to those attributes occur
only once in each record. On the other hand, some questions from the core content, such as those
about income and program participation, are asked for each month of the panel; the number of
corresponding variables will reflect that fact. Similarly, SIPP-generated information can occur
once (e.g., person number) or many times (e.g., monthly interview status) on each record.

3
 Prior to the 1990 Panel, core wave files were issued with a single record for each person. Each record contained
data for all 4 reference months covered by the wave.


                                                      1-8
                                                                                 INTRODUCTION


Linking Files

Before linking files, users must understand several conceptual issues: reasons for nonmatches,
handling of nonmatches; data quality of matched records containing imputed data; and design of
the linked file. There are five ways of linking SIPP data files: within a core wave file; core wave
file to core wave file; topical module file to core wave file; topical module file to full panel file;
and core wave file to full panel file. The linking process is generally the same for each type of
link. However, because variable names and file structures are different, the process for each type
of linkage is described in Chapter 13.


Comparison of SIPP with Other Surveys
Because there is some overlap in the content of SIPP and certain other surveys, the question
arises: When should an analyst use SIPP instead of the other surveys? A brief look at selected
surveys might provide some guidance (Table 1-1 compares some key points as well).


Current Population Survey

The CPS, sponsored jointly by the Census Bureau and the Bureau of Labor Statistics (BLS), is
primarily a labor force survey. It is used to compute the federal government’s official monthly
unemployment statistics, along with other estimates of labor force characteristics. In addition to
its core content, a different supplement is fielded each month. One of these, the March Annual
Demographic Supplement, is currently the official source of estimates of income and poverty in
the United States. Compared with SIPP, however, the CPS has gaps in the area of income
measurement. A yearlong reference period means that CPS respondents are more likely than
SIPP respondents to forget or misreport certain asset income or irregular income sources. The
CPS does not collect data on assets and liabilities to the same extent as SIPP. The CPS is also
less comprehensive in the area of program participation, sometimes missing partial-year data.

The CPS reporting unit is the person, but the sample covers housing units; whoever happens to
be living at the address at the time of the interview is in the sample. When residents of a CPS
housing unit move, they are not followed; instead, the new residents become sample members.
Housing units spend 4 months in the sample, 8 months out, and 4 months in again. The target
sample size for the CPS is 50,000 housing units each month. Like SIPP, the CPS sample covers
the U.S.-resident noninstitutionalized population, although, unlike SIPP, the CPS includes people
living in military barracks.


                                                 1-9
SIPP USERS’ GUIDE

                          Table 1-1. Comparison of SIPP, CPS, and PSID

                              Survey of Income and            CPS (March Income           Panel Study of Income
         Feature              Program Participation               Supplement)                   Dynamics
Sample size and design      1996 Panel: 40,188             50,000 households; each      9,000 families; over-
                            households; new panel          household in sample for 8    represents low-income
                            periodically; each original-   months over 2-year period;   families; continuing panel
                            sample adult in panel for      rotation group design;       with annual interviews
                            no. of months in survey;       monthly interviews
                            interviews every 4 months      (income supplement once
                                                           per year)
Sample designed to be       No                             Yes                          No
representative within
states?
Income data                Data for about 70 cash and Data for prior calendar     Data for prior calendar
                           in-kind Sources at each 4-  year for about 35 cash and year for about 25 cash and
                           month wave, with monthly in-kind Sources               in-kind Sources with
                           reporting for most Sources                             specific months received
Tax data                   Information to determine    None                       Information to determine
                           federal, state, and local                              federal, state, and local
                           income taxes; payroll                                  income taxes; payroll
                           taxes; property taxes                                  taxes; property taxes
Asset-holdings data        Detailed inventory of real  None, except home          Regularly, information
                           and financial assets and    ownership                  about home value and
                           liabilities once each year                             mortgage debt;
                           for panels from 1996                                   occasionally, information
                           forward and at least once                              about saving behavior and
                           per panel in prior years;                              wealth
                           more frequent measures
                           for assets relevant for
                           assistance programs
Expenditure data           Information at least once   None                       Monthly rent or mortgage
                           each panel before 1996                                 costs; annual utility costs;
                           and once a year 1996 and                               average weekly food costs;
                           beyond on previous                                     child support payments
                           month’s out-of-pocket
                           medical care costs, shelter
                           costs (mortgage or rent
                           and utilities), dependent
                           care costs, and child
                           support payments
Note: SIPP sample size and design information valid for the 1996 Panel. For information about pre-1996 SIPP
panels, see Chapter 2.
Source: Citro, C.F., Michael, R.T., and Maritano, N. (eds.) (1995). Measuring Poverty: A New Approach.
Washington, DC: National Academy Press, Appendix B.


The Panel Study of Income Dynamics

The Panel Study of Income Dynamics (PSID) was begun in 1968 as a nationally representative,
longitudinal survey of the U.S. population. It initially included about 5,000 households and now
has about 8,700. The University of Michigan conducts PSID on an annual basis; the focus of the


                                                      1-10
                                                                             INTRODUCTION

survey is economics and demographics, especially income sources and amounts, employment
family composition changes, and residential location. The content is broad, however, and
includes sociological and psychological measures. As of 1995, PSID had collected information
from more than 50,000 individuals, spanning as much as 28 years of their lives. The sample
includes individuals interviewed every year since 1968, a representative national sample of 2,000
Hispanic households added in 1990, and families formed by members of the original sample
families.


Survey of Program Dynamics

The Survey of Program Dynamics (SPD) is a new longitudinal survey designed to be an annual
follow-up to the 1992 and 1993 SIPP Panels. Approximately 38,000 households were in the
initial sample; a second phase, initiated with the implementation of the core SPD questionnaire
in 1998, was projected to include approximately 18,500 households, including all sample
households with children and an overrepresentation of households in and near the poverty
threshold. SPD data for 1996–2002, along with information collected from 1992 through 1995
for SIPP, will provide a combined 10 years of data measuring program eligibility, access, and
participation. Analysts will be able to track welfare dependency, the beginning and end of
periods of welfare, factors that may be causes of such periods, and the impacts that the changes
will have on families, adults, and children over time.


Guide to This Document
The balance of this Users’ Guide is organized as follows. Chapters 1 through 5 are introductory
chapters, designed mainly for beginning SIPP users.

!   Chapter 2 discusses how the SIPP survey is designed and implemented. The chapter
    describes the structure of the survey, sample selection, and field procedures.
!   Chapter 3 examines the general nature of questions in SIPP. Discussion focuses on core and
    topical content, including brief descriptions of individual topical modules.
!   Chapter 4 describes what happens after data collection. This chapter covers all aspects of
    post-data-collection processing, including consistency checks, data editing, and procedures
    for imputing missing data.
!   Chapter 5 describes SIPP data files and supporting documentation and tells analysts where to
    find that information.
Chapters 6 through 8 provide more technical information on how to properly use the data and
interpret the results.


                                             1-11
SIPP USERS’ GUIDE

!   Chapter 6 discusses the types and sources of nonsampling error in SIPP, including recall
    error, the seam effect, time-in-sample effects, attrition bias, and sources of additional
    information about these topics.
!   Chapter 7 defines sampling error and discusses how to calculate sampling errors for SIPP
    estimates.
!   Chapter 8 discusses the topic of weights in SIPP, with a focus on how to choose weights.

Chapters 9 through 13 provide specific instructions for the use of the SIPP public use microdata
files.

!   Chapter 9 introduces this section by giving an overview of issues common to all of the SIPP
    data files.
!   Chapter 10 describes how to use the core wave files. The chapter describes the structure of
    the files and how to use the accompanying technical documentation. It also discusses how the
    core wave files relate to the core survey instrument. Finally, the chapter provides detailed
    descriptions of how to use the core wave files when performing common tasks.
!   Chapter 11 describes how to use the topical module files, the structure of the files, and use of
    the accompanying technical documentation. It also discusses how the topical module files
    relate to the corresponding topical module survey instruments. Finally, the chapter provides
    detailed descriptions of how to use the topical module files when performing common tasks.
!   Chapter 12 describes how to use the full panel files, the structure of the files, and use of the
    accompanying technical documentation. It also discusses how the full panel files relate to the
    core survey instruments. Finally, the chapter provides detailed descriptions of how to use the
    full panel files when performing common tasks.
!   Chapter 13 describes how to link core wave, topical module, and full panel files. The chapter
    covers both important conceptual issues and the mechanics of linking the various files.
Finally, the Users’ Guide includes the following additional information:

!   Appendixes contain in-depth discussion of weighting; tables with information about the size
    and number of waves, missing waves, oversampling, and additional information for selected
    SIPP panels; a crosswalk; and detailed information about topcoding.
!   An acronym list provides a guide to the acronyms used in this manual.
!   The glossary defines terms that may be unfamiliar to some users.
!   The references section contains references and suggested reading for all chapters in this
    guide.
!   An index helps users locate information quickly and easily.


                                               1-12
                                                                               INTRODUCTION


Where to Go for More Information
The following sources provide expanded, specific information about various aspects of SIPP and
related products.


SIPP Web Site

The SIPP homepage (located at http://www.sipp.census.gov/sipp) includes, among other things,
this Users’ Guide and an online tutorial that provides a hands-on introduction to SIPP. As the
survey and data files evolve, the online documentation will be kept current. Also, users may
subscribe at the SIPP Web site to sipp-users, a listserv for SIPP Users Group members. List
members share new reports and studies, programming help, and research ideas.


SIPP Quality Profile

The SIPP Quality Profile, 3rd Ed. (U.S. Census Bureau, 1998a), summarizes what is known
about the sources and magnitude of errors in estimates based on SIPP data. It presents
information on errors associated with each phase of survey operations: frame design and
maintenance, sample selection, data collection, data processing, estimation (weighting), and data
dissemination. Some information, such as the outcome of macroevaluation studies, is addressed
outside of this framework in a separate chapter. The SIPP Quality Profile is available at the SIPP
Web site.


Bibliography

The SIPP bibliography, also available at the SIPP Web site under Publications and Analyses, is
the most comprehensive, currently available online resource of published and unpublished
documents related to SIPP. It includes substantive studies that use SIPP data, as well as citations
to methodological research about SIPP. Documents relating to the ISDP also are included. The
bibliography contains nearly 2,000 references to reports, conference papers, working papers,
journal articles, dissertations, books, and book sections. Abstracts are available for selected
publications.


Reports and Working Papers

The references cited in this report include several types of Census Bureau publications. The P-70
series (Current Population Reports, Household Economic Studies) presents tabulations and


                                              1-13
SIPP USERS’ GUIDE

analyses of SIPP data. SIPP working papers provide information about methodological aspects
of the survey as well as analyses of SIPP data. The working papers are not cleared for formal
publication but are readily available at the SIPP Web site. Since 1984, papers on SIPP results and
methodology presented at the annual meeting of the American Statistical Association have been
published in the working-paper series. Several important papers on SIPP methodology and
evaluation studies have been presented and published in the proceedings of the Census Bureau’s
annual research conferences, which began in 1985. In addition to those sources, papers and
reports with information about the quality of SIPP data have been published by numerous other
agencies, organizations, and professional associations.


Technical Documentation

Technical documentation accompanies the SIPP microdata files that users acquire from the U.S.
Census Bureau. The technical documentation briefly describes the contents of the particular file
and includes the following items:

!   A glossary of selected terms,
!   Lists of codes and descriptions,
!   A data dictionary and instructions on how to use it,
!   A source and accuracy statement,
!   A copy of the core questionnaire used for the panel in question,
!   User notes, and
!   File information.


                                               1-14
2. SIPP Sample Design and
   Interview Procedures
This chapter provides new users of the Survey of Income and Program Participation (SIPP) with
basic information about the organizing principles of SIPP, sample selection, and the data
collection process. The chapter also briefly reviews interview procedures.

SIPP is a longitudinal survey that collects information on topics such as income, participation in
government transfer programs, employment, and health insurance coverage. The initial survey
design called for the introduction of a new sample, called a panel, every year; each panel was
planned to cover 32 months. In practice, a number of panels have been shorter. A result of the
initial design was that multiple SIPP panels were in the field simultaneously. A redesign
introduced with the 1996 Panel abandoned the overlapping panel structure and extended the
length of the 1996 Panel to 4 years. Subsequent panels will be 3 years in length.


Organizing Principles
SIPP is administered in panels and conducted in waves and rotation groups. Within a SIPP
panel, the entire sample is interviewed at 4-month intervals. These groups of interviews are
called waves. The first time an interviewer contacts a household, for example, is Wave 1; the
second time is Wave 2, and so forth. As discussed in Chapter 3, each wave contains core
questions that are asked each time, along with topical questions that vary from one wave to the
next.

Sample members within each panel are divided into four subsamples of roughly equal size; each
subsample is referred to as a rotation group. One rotation group is interviewed each month.1
During the interview, information is collected about the previous 4 months, which are referred to
as reference months. Thus, each sample member is interviewed every 4 months, with information
about the previous 4-month period collected in each interview (see Table 2-2).


Panels

The original design of SIPP called for an initial selection of a nationally representative sample of
households, with all adults in those households being interviewed once every 4 months over a
32-month period. In addition, interviews were to be conducted with any other adults living with
original sample members at subsequent waves. The first sample, the 1984 Panel, began
1
    The month in which the interview takes place is called the interview month.


                                                          2-1
SIPP USERS’ GUIDE

interviews in October 1983. The 1985 Panel began in February 1985. Subsequent panels began
in February of each calendar year, resulting in concurrent administration of the survey in
multiple panels. Because of budget constraints, actual panel duration has varied. The original
goal was to have panels covering eight waves (32 months). In several instances, panels were
terminated after seven waves (28 months). Two panels were terminated even earlier: 1988 (six
waves) and 1989 (three waves).

With certain exceptions (Table 2-1), each panel overlapped part of the previous panel, with the
result that there were two or three active panels at any given time. The overlap allows analysts to
combine records from different panels, thus having larger samples (and lower standard errors)
for cross-sectional analyses.2 The overlapping feature of the SIPP design was dropped with the
1996 redesign. Standard errors have remained small since the redesign because the 1996 and
following panels each have target sample sizes of at least 37,000 interviewed households for
Wave 1, almost twice the size of two of the previous panels.

                         Table 2-1. Summary of the 1984–1996 SIPP Panels


                                                                   Number of Wave 1
           Date of First Date of Last     Number of Wave 1         Original Sample         Number      Short
Panela     Interview     Interview        Eligible Households      Members                 of Waves    Wavesb
 1984       Oct. 83       Jul. 86          20,897                   55,400                   9          2, 8
 1985       Feb. 85       Aug. 87          14,306                   37,800                   8          2
 1986       Feb. 86       Apr. 88          12,425                   32,800                   7          3
 1987       Feb. 87       May 89           12,527                   33,100                   7          -
 1988       Feb. 88       Jan. 90          12,725                   33,500                   6
 1989       Feb. 89       Jan. 90          12,867                   33,800                   3
 1990       Feb. 90       Sep. 92          23,627                   61,900                   8
 1991       Feb. 91       Sep. 93          15,626                   40,800                   8
 1992       Feb. 92       May 95           21,577                   56,300                  10          -
 1993       Feb. 93       Jan. 96          21,823                   56,800                   9
 1996       Apr. 96       Mar. 00          40,188                   95,402                  13
a
 No new panels in 1994 and 1995.
b
 Short waves contained three rotations instead of the standard four.
Source: SIPP Quality Profile, 3rd Ed. (U.S. Census Bureau, 1998a).


Although most available data predate the 1996 redesign (discussed in Chapter 1), the redesign
affected the nature of some panels. In preparation for the redesign, the Census Bureau canceled
the 1994 and 1995 Panels and extended the 1992 and 1993 Panels (Table 2-1). The last 1993
Panel interview took place in January 1996 to ensure that data would remain continuous. Also in
1996, the Census Bureau initiated the Survey of Program Dynamics (SPD) as an extension of
SIPP. For the SPD, the Census Bureau began recontacting people in the 1992 and 1993 SIPP
panels and will continue annual data collection through 2002. The plan is to yield 10 years of


2
  Combining data across panels allows for larger sample sizes and, consequently, smaller standard errors for some
types of estimates. It also helps alleviate two types of bias common to longitudinal surveys: time-in-sample effects
and attrition bias.


                                                        2-2
                               SIPP SAMPLE DESIGN AND INTERVIEW PROCEDURES

data (1992–2001) for those two panels to support analyses of changes during welfare reform and
for the pre- and postreform periods (Chapter 1).


Waves and Rotation Groups

One full 4-month cycle of administering the questionnaire to the entire panel is a wave. The 1984
through 1993 Panels were designed to have eight waves each, although more often than not the
number of waves actually administered was different (Table 2-1). The 1996 Panel has 12 waves.

Rotation groups are random subsamples of approximately equal size. Each month, the members
of one rotation group are interviewed; over the course of 4 months, all rotation groups are
interviewed, providing data for the full set of 4 months. For many survey items, SIPP collects
data for each of the 4 calendar months preceding the interview month. Those 4 months together
are called reference months, or the reference period. (Table 2-2 provides an illustration of the
reference months for the various rotation groups in each wave of the 1996 Panel.)

The reference period length and the timing of the interviews address several concerns:
respondent recall error, which increases as the recall period lengthens; respondent burden, which
increases with the number of times they are interviewed; and the costs of frequent interviews. By
spreading the interviews for each wave evenly over 4 months, the rotation group structure allows
the Census Bureau to keep a skilled and experienced team of interviewers in the field year round.
This eases management burden and allows Census Bureau interviewers to master the
complexities of the SIPP questionnaire and to maintain that mastery.

Each SIPP panel prior to 1990 had fewer than eight waves or contained one wave that consisted
of fewer than four rotation groups (Table 2-1). As discussed in Chapter 3, the questionnaire
administered at each wave contains core questions, those asked at every interview, along with
sections containing topical questions that vary from one wave to the next. Respondents in the
skipped rotation groups have no gap in core data, but they do not provide core data for the full
duration of the panel, and they lack topical data for the wave in which they were skipped.
Analysts should be alert to the consequences of the skipped rotations: some topical information
is not available for the full sample, and the length of time an analyst can follow adults from the
original sample is reduced for selected rotation groups.


Reference Periods

The reference period for most core items is the 4-month period preceding the month of the
interview for the given wave. Data for most core items are collected for each of the preceding 4
months. Some data on labor force characteristics are collected with weekly resolution.
Subsequently, weekly labor force characteristics are recorded on a monthly basis.


                                               2-3
SIPP USERS’ GUIDE

         Table 2-2. 1996 Panel: Rotation Groups, Waves (W), and Reference Months

Reference              Rotation Group                 Reference               Rotation Group
Month           1        2         3          4       Month            1        2          3             4
  Dec. 95    W1 1                                        Dec. 97    W7 1               See Wave 6 data in bottom
   Jan. 96   W1 2      W1 1                               Jan. 98   W7 2      W7 1               of first column.

  Feb. 96    W1 3      W1 2      W1 1                    Feb. 98    W7 3      W7 2      W7 1
  Mar. 96    W1 4      W1 3      W1 2       W1 1         Mar. 98    W7 4      W7 3      W7 2          W7 1
  April 96   W2 1      W1 4      W1 3       W1 2         April 98   W8 1      W7 4      W7 3          W7 2
  May 96     W2 2      W2 1      W1 4       W1 3         May 98     W8 2      W8 1      W7 4          W7 3
  June 96    W2 3      W2 2      W2 1       W1 4         June 98    W8 3      W8 2      W8 1          W7 4
   July 96   W2 4      W2 3      W2 2       W2 1          July 98   W8 4      W8 3      W8 2          W8 1
  Aug. 96    W3 1      W2 4      W2 3       W2 2         Aug. 98    W9 1      W8 4      W8 3          W8 2
  Sep. 96    W3 2      W3 1      W2 4       W2 3         Sep. 98    W9 2      W9 1      W8 4          W8 3
   Oct. 96   W3 3      W3 2      W3 1       W2 4         Oct. 98    W9 3      W9 2      W9 1          W8 4
  Nov. 96    W3 4      W3 3      W3 2       W3 1         Nov. 98    W9 4      W9 3      W9 2          W9 1
  Dec. 96    W4 1      W3 4      W3 3       W3 2         Dec. 98    W10 1     W9 4      W9 3          W9 2
   Jan. 97   W4 2      W4 1      W3 4       W3 3          Jan. 99   W10 2     W10 1     W9 4          W9 3
  Feb. 97    W4 3      W4 2      W4 1       W3 4         Feb. 99    W10 3     W10 2     W10 1         W9 4
  Mar. 97    W4 4      W4 3      W4 2       W4 1         Mar. 99    W10 4     W10 3     W10 2        W10 1
  April 97   W5 1      W4 4      W4 3       W4 2         April 99   W11 1     W10 4     W10 3        W10 2
  May 97     W5 2      W5 1      W4 4       W4 3         May 99     W11 2     W11 1     W10 4        W10 3
  June 97    W5 3      W5 2      W5 1       W4 4         June 99    W11 3     W11 2     W11 1        W10 4
   July 97   W5 4      W5 3      W5 2       W5 1          July 99   W11 4     W11 3     W11 2        W11 1
  Aug. 97    W6 1      W5 4      W5 3       W5 2         Aug. 99    W12 1     W11 4     W11 3        W11 2
  Sep. 97    W6 2      W6 1      W5 4       W5 3         Sep. 99    W12 2     W12 1     W11 4        W11 3
   Oct. 97   W6 3      W6 2      W6 1       W5 4         Oct. 99    W12 3     W12 2     W12 1        W11 4
  Nov. 97    W6 4      W6 3      W6 2       W6 1         Nov. 99    W12 4     W12 3     W12 2        W12 1
      Dec. 97          W6 4      W6 3       W6 2             Dec. 99          W12 4     W12 3        W12 2
             Jan. 98             W6 4       W6 3                    Jan. 00             W12 4        W12 3
                   Feb. 98                       W6 4                       Feb. 00                W12 4
Note: The cell entry W1 1 represents Wave 1, reference month 1. The last reference month of each wave is in
boldface type. For rotation group 1, the reference months for Wave 1 were Dec. 95 through Mar. 96.


After the basic demographic information, one of the first items in the SIPP interview illustrates
the availability of time-specific data in SIPP. The respondent is asked if he or she had a health
insurance plan at any time during the previous 4 months. If the answer is yes, SIPP asks if the


                                                   2-4
                                      SIPP SAMPLE DESIGN AND INTERVIEW PROCEDURES

respondent had coverage in each of the individual 4 months. Thus data are collected for 4
individual months at each wave. Over the course of a 13-wave panel, data are collected for 52
consecutive months for each panel member. For the 1996 Panel, the rotation groups were
interviewed in order. Specifically, for Wave 1, rotation group 1 was interviewed in April,
rotation group 2 in May, rotation group 3 in June, and rotation group 4 in July. For previous
panels, however, the specific months varied slightly among rotation groups. With the 1990
Panel, for instance, panel members in rotation group 2 were interviewed first; rotation group 1
was actually the fourth rotation group surveyed in that panel.3


Sample Design
SIPP uses a complex sample design that has important implications for the estimation of standard
errors. Because the SIPP design is not a simple random sample, the standard errors reported by
most off-the-shelf statistical software will underestimate the true standard errors of estimates
from SIPP. (See Chapter 7 for details.) A detailed description of the SIPP sample design and
standard error calculations can be found in the third edition of the SIPP Quality Profile (U.S.
Census Bureau, 1998a).


Selection of Sampling Units

The Census Bureau employs a two-stage sample design to select the SIPP sample. The two
stages are (1) selection of primary sampling units (PSUs) and (2) selection of address units
within sample PSUs. Census Bureau interviewers follow an established procedure to identify
sample members within the selected address units.


Primary Sampling Units

The frame for the selection of sample PSUs consists of a listing of U.S. counties and independent
cities, along with population counts and other data for those units from the most recent census of
population. Counties either are grouped with adjacent counties to form PSUs or constitute a PSU
by themselves.

Following the formation of the PSUs, the smaller ones, called non-self-representing (NSR)
PSUs, are then grouped with similar PSUs in the same region (South, Northeast, Midwest, West)
to form strata; census data for a variety of demographic and socioeconomic variables are used to
determine the optimum groupings. A sample of NSR PSUs is selected in each stratum to
represent all PSUs in the stratum. All of the larger PSUs are included in the sample and are
called self-representing (SR) PSUs.

3
  An explanation for the relabeling of rotation groups in earlier panels is provided in Chapter 2 of the 2nd edition of
the SIPP Users' Guide (U.S. Census Bureau, 1991).


                                                         2-5
SIPP USERS’ GUIDE

Selection of Addresses in Sample PSUs

SIPP selects addresses from five separate, non-overlapping sampling frames maintained by the
Census Bureau. They are unit (formerly called the address enumeration districts [Eds] frame);
area (area EDs frame); group quarters (special places frame); housing unit coverage; a coverage
improvement frame, and a new-construction (or permit) frame. The first three frames are based
on census counts from the most recent decennial census; unit and area frames are determined by
a process called “address screening,” which has been done at the block level since 1990. The unit
frame lists addresses of housing units located in census blocks in areas that issue building
permits and in which at least 96 percent of the addresses are complete (with street name and
house number). The area frame contains addresses from the remaining census blocks that are not
in permit-issuing areas, or where more than 4 percent of the addresses in the blocks are missing.
Those addresses are mostly in rural areas. The group quarters frame includes boarding houses,
hotel rooms, and institutions that are found in the decennial census but are not counted as
housing units. Together, the three frames provide almost 90 percent of the sample addresses for
each SIPP panel.

The coverage improvement frame is used to include addresses of housing units that were missed in
the census count but were found in postenumeration surveys. The percentage of sample addresses
from this frame is typically small (0.1 percent of the sample addresses in the 1986 Panel).

The new-construction frame is used to provide coverage of new structures for which building
permits have been issued since the last decennial census in areas covered by the unit frame. This
frame is updated continually, and the percentage of addresses sampled from it increases each
year until data from another decennial census become available.

Within each sample PSU, the addresses in the sampling frames are grouped into clusters. The
clusters are then sampled, and the selected cluster of addresses is included for interviewing.4 In
the unit frame, the 1996 Panel had clusters of one housing unit; for prior panels, clusters of two
neighboring addresses were used. In the area and group quarter frames, clusters are constructed
with the expectation of four housing units or housing unit equivalents. With the area frame, the
sampled clusters are visited by SIPP interviewers prior to the scheduled interviewing. The
interviewers list all residential addresses within the selected clusters. With the new-construction
frame, the 1996 Panel has a 50-50 mixture of four- and eight-unit clusters. Previously, clusters of
four housing units were formed. No clustering is used with the coverage improvement frame.


Identifying Household Members Within Sampled Addresses

At the time of the first interview, the Census Bureau interviewer visits sampled addresses,
verifies the addresses, determines whether they contain occupied housing units, and identifies the
housing units located at each address. A housing unit is defined as a living quarters with its own
entrance and cooking facilities. The people living in a housing unit constitute a household (see
below). Interviews are conducted at all households in sampled addresses. However, SIPP does

4
  In a few cases, where the clusters contain many more housing units than expected, a subsample of addresses is
selected.


                                                     2-6
                                      SIPP SAMPLE DESIGN AND INTERVIEW PROCEDURES

not treat the household as a continuous unit to be followed in the panel. SIPP is a person-based
survey; as discussed below, SIPP follows original sample members regardless of household
composition.

The interviewer compiles a roster for each sampled household, listing all people living or staying
at the address. Next, the interviewer identifies those who are household members by determining
if the address is their usual residence (Table 2-3).5 SIPP designates all people who are considered
members as original sample members. Over the course of the panel, original sample members are
followed and interviewed every 4 months.6

                                     Table 2-3. Household Membership

                                                                                       YES           NO
                                                                                       (Is Member of (Not Member
Question                                                                               Household)    of Household)
Person staying at SIPP address at time of interview
Members of family, visitors, etc.—ordinarily sleeps here                                Y
– here temporarily, no living quarters held elsewhere                                   Y
– here temporarily, living quarters held elsewhere                                                       N
In Armed Forces, stationed locally and sleeps here                                      Y
In Armed Forces, stationed elsewhere and here on leave                                                   N
Student temporarily attending school here, living quarters held elsewhere                                N
– married and accompanied by own family                                                 Y
– student nurse attending school nearby                                                 Y
Absent person who usually lives at SIPP address
Inmate in an institutional special place regardless of whether living quarters are                       N
being held here
Temporarily on vacation, in hospital, and living quarters held                          Y
Absent for work, living quarters held here                                              Y
Absent for work, living quarters held here and elsewhere but comes here infrequently                     N
Unmarried college student working away from home during break, living quarters          Y
held here
In Armed Forces, stationed elsewhere                                                    Y
In school elsewhere, living quarters held—not married or with own family                Y
– married and accompanied by own family                                                                  N
– attending school overseas                                                                              N
– student nurse living at school                                                                         N
Exceptions and doubtful cases
Person with two residences, sleeps most often in other location                                          N
Person with two concurrent residences, sleeps here most often                           Y
Citizen of foreign country temporarily in U.S., living on premises of an embassy,                        N
ministry, legation, chancellery, or consulate
Citizen of foreign country temporarily in U.S.—studying here and no other usual         Y
residence in U.S.
– living and working here and no other usual residence in U.S.                          Y
– visiting or traveling in U.S.                                                                          N
Source: SIPP Information Booklet, 1990 Panel (Waves 1–8) and 1991 Panel (Waves 1–8), Form SIPP-7004A (1-9-89).

5
  In most cases, a person is a member of a household if the sample unit is that person's usual place of residence at the
time of the interview. The person may be present or temporarily absent. A person staying in the sample unit who has
no usual place of residence elsewhere is a household member. A usual place of residence is the place where a person
normally lives and sleeps. This must be specific living quarters held for the person to which he or she is free to
return at any time.
6
  In the 1993 Panel only, SIPP followed all original sample members regardless of age. Previous panels, as well as
the 1996 Panel, have followed only people 15 years of age or older who were original sample members.


                                                         2-7
SIPP USERS’ GUIDE


Oversampling

Originally, SIPP did not oversample any groups within the population. Over the years, however,
budget constraints dictated a reduction in the SIPP panel size. As a result, analysts found it
difficult to conduct meaningful analyses of government programs for the low-income population
because the sample sizes for the subpopulations were too small. In response to those concerns
about the diminished usefulness of SIPP data, the Census Bureau pursued budget initiatives to
increase the sample to its original size and to oversample the low-income population.

Oversampling occurs when certain groups or units are sampled with higher probabilities than
others. Analysts then have enough cases to complete analysis of subpopulations or subgroups of
the population. The share of an oversampled group in the resulting sample is greater than its
share in the population from which it was drawn. Although this imbalance addresses the need for
increased sample sizes for certain subpopulations, analysts looking at the entire sample will need
to use weights in their analyses to redress the imbalance (Chapter 8).7


Oversampling in the 1990 Panel

As detailed in the SIPP Quality Profile and discussed in Allen et al. (1993), oversampling was
used with the 1990 Panel, which included about 3,900 predominantly low-income households
from the truncated 1989 Panel (see Tables 2-1 and 2-4). In the 1990 Panel, the Census Bureau
included all housing units from Wave 1 of the 1989 Panel in which the head of household was
black, Hispanic, or female with no spouse present living with relatives (FHNSP). Such
households tend to have higher poverty rates than the general population. The 1990 Panel also
included a small sample of other housing units for the 1989 Panel. Table 2-4 shows the
components of the 1990 Panel.

                                 Table 2-4. Composition of the 1990 Panel

                                                                                           Number of Eligible
Components                                                                                 Households
Households in addresses originally to be interviewed first in the 1990 Panel                19,700
Households associated with sample addresses first interviewed in February through May
1989 (in the 1989 Panel ) and at the time headed by a black, Hispanic, or FHNSPa                 2,700
Households in one-ninth of all other 1989 Panel sample addresses                                 1,200
a
 Female head of household with no spouse present living with relatives.
Source: Allen, Petroni, Singh, 1993.


Oversampling in the 1996 Panel

The Census Bureau also oversampled the low-income population for the 1996 Panel,8 using 1990
decennial census information. Housing units within each PSU were split into high- and low-
7
    Weights are needed even if there is no oversampling. See Chapter 8.
8
    For a more detailed discussion of the 1996 oversample design, see Huggins and King (1997).


                                                        2-8
                                 SIPP SAMPLE DESIGN AND INTERVIEW PROCEDURES

poverty strata. If the housing unit received the Census long form that included income questions,
the unit’s poverty status was determined directly; for other housing units, poverty status was
assumed on the basis of responses to Census short-form items predictive of poverty rates. The
Census Bureau then sampled the low-income stratum at 1.66 times the rate of the high-income
stratum in each PSU. Compared with the number of cases produced without oversampling, this
oversampling produced an 18 percent increase in the number of cases in and near poverty at
Wave 1.9 Even greater gains occurred in some subgroups, such as blacks and Hispanics in
poverty, with a gain in the number of sample cases as high as 24 percent. However, the increases
in effective sample sizes were somewhat smaller after allowance was made for the increased
variance associated with differential weighting. Also, the sample sizes for the higher income and
higher age groups were reduced.


Following Rules
SIPP is a true longitudinal survey that tracks people over time. With few exceptions, original
sample members are interviewed every 4 months over the duration of the panel. When original
sample members move to new addresses, interviewers attempt to locate them and continue to
interview them every 4 months.

The SIPP rules call for following original sample members who move, provided they are not
institutionalized, do not live in military barracks, or do not move abroad. Prior to the 1993 Panel,
and resuming with the 1996 Panel, original sample members under age 15 who moved were not
followed. Thus, data were collected for them in subsequent waves only if they either continued
to live with an original sample member 15 years or older or were age 15 by the last day of the
reference period in which they moved. With Wave 4 of the 1993 Panel, SIPP began following all
children who were in original sampled households (SIPP Quality Profile, 1998, pp. 3–6),
including babies born to sample members during the panel.

When original sample members move into households with other individuals not previously in
the survey, the new individuals become part of the SIPP sample for as long as they continue to
live with an original sample member. Similarly, when new individuals move in with original
sample members after the first interview, they too become part of the SIPP sample for as long as
they continue to live with an original sample member. If no original sample members live at an
address where a previous interview was conducted, SIPP does not collect information from the
new occupants of that address.

Figure 2-1 illustrates the following rules in practice.


9
  Low-income strata were sampled at a rate of 0.00062389. High-income strata were sampled at a rate of
0.00037489. The oversampling rate therefore comes to 1.6642.


                                                 2-9
SIPP USERS’ GUIDE

                    Figure 2-1. Following Rules


                                      Demolished address unit – no interview.


                                      Vacant address unit – no interview.


                                      Five people (mom, dad, son, daughter, and
                                      cousin) reside at this address and thus
                                      constitute a household. Wave 1 interview
                                      conducted for all five people.


                                      Son joined Army and is living in barracks.
                                      He is not followed because military bases
                                      are outside the scope of the SIPP sample.
                                      However, a record exists in the Wave 2
                                      interview reflecting proxy responses by
                                      another member of the household.
                                      Interviewer takes data on the four people
                                      who remain at this address.


                               2-10
   SIPP SAMPLE DESIGN AND INTERVIEW PROCEDURES

Figure 2-1. Following Rules (continued)


                        Daughter got married; she and husband live
                        with her parents and cousin at time of Wave
                        3 interview. The husband is interviewed at
                        the same time that others in the house are
                        interviewed. There is no further information
                        taken on the son (who joined the Army and
                        is living in barracks, which is outside the
                        SIPP universe).


                        Daughter and her husband moved to a new
                        address and formed their own household at
                        the time of Wave 4. The interviewer takes
                        data on mom, dad, and cousin in the first
                        household; and daughter and daughter’s
                        husband in the second household.


                 2-11
SIPP USERS’ GUIDE

                    Figure 2-1. Following Rules (continued)


                                            The cousin, who is over 15a, moved and
                                            now lives with her mother and father, who
                                            were not in the sample originally. Therefore,
                                            for this Wave 5 interview, the interviewer
                                            takes data from seven people (mom and dad
                                            in the first household, daughter and
                                            daughter’s husband in the second household,
                                            and cousin, cousin’s mother, and cousin’s
                                            father) in the third household.


                                            In Wave 6, there is no change from the
                                            previous wave.

                                            a
                                              For Waves 4+ of the 1993 Panel only, SIPP
                                            followed original sample persons under 15 years old
                                            who moved to other households with or without
                                            another original SIPP panel member over 15. In all
                                            other panel years, SIPP did not follow original
                                            sample persons under 15 years old who moved to
                                            other households with or without another original
                                            SIPP panel member over 15. In this example,
                                            therefore, the cousin is followed because she is over
                                            15. In the 1993 Panel, the cousin would have been
                                            followed without regard to age.


                                     2-12
   SIPP SAMPLE DESIGN AND INTERVIEW PROCEDURES

Figure 2-1. Following Rules (continued)


                        At the time of Wave 7, the interviewer
                        discovers that mom and dad have moved out
                        of their old home.


                        The interviewer locates mom and dad and
                        interviews them at their new address. The
                        daughter and her husband are interviewed at
                        their previous address, as are the cousin and
                        the cousin’s parents. Altogether, the
                        interviewer takes data from seven people
                        (mom, dad, daughter, daughter’s husband,
                        cousin, cousin’s mother, and cousin’s father)
                        in three households.


                 2-13
SIPP USERS’ GUIDE

                    Figure 2-1. Following Rules (continued)


                                            Mom and dad have separated at the time of
                                            Wave 8. Mom is in the same address as in
                                            the previous wave, but dad is in a new
                                            location; thus they form separate
                                            households. Meanwhile, the daughter and
                                            husband now have a baby and the cousin’s
                                            household has remained the same. The
                                            interviewer takes data for eight people
                                            (mom, dad, daughter, daughter’s husband,
                                            daughter’s baby, cousin, cousin’s mother,
                                            and cousin’s father) in four households.


                                     2-14
                                   SIPP SAMPLE DESIGN AND INTERVIEW PROCEDURES

Interviewers rely on several sources of information to locate movers. At the first interview, the
interviewer obtains the name, address, and telephone number of a person who could furnish the
new address should the entire household move. If necessary, interviewers may contact neighbors,
employers, mail carriers, real estate companies, rental agents, or postal supervisors to locate
original sample members who have moved.

If an entire household moves, the interviewer tries to find the original sample members and
interview them at their new address(es) if they remain in the locality. If the household relocates
into or close to a different PSU, a SIPP interviewer in that area may interview them. For
example, if a couple moves from Boston to Seattle, a SIPP interviewer in the Seattle area will
likely interview the couple for the remaining waves of their panel. Should the entire household
move more than 100 miles away from a SIPP PSU, attempts will be made to interview by
telephone. If the household cannot be reached, the sample members will be dropped from the
survey. Specifically, they will be treated as Type D noninterviews (Type D noninterviews are
discussed later in the chapter).

If only some original sample members move, the interviewer completes interviews with all
eligible household members at both the original address and the address(es) of those who have
moved. If an original sample member leaves a SIPP household and the remaining original
sample members cannot provide a new address, the interviewer will try to find the person
through the means discussed above. Similar to what happens with a household, if an individual
original sample member moves within the United States but more than 100 miles away from a
SIPP PSU, a telephone interview will be attempted. When that is not possible, the person is
treated as a Type D noninterview.

SIPP does not interview original sample members if they move outside the United States,
become members of the military living in barracks, or become institutionalized (e.g., nursing
home residents, prison inmates). The Census Bureau attempts to track such individuals, however.
Should they return to the noninstitutionalized resident U.S. population, the Census Bureau will
resume trying to interview them.10


Difference Between Movers and Those Who Are
Temporarily Away

There is an important difference between a mover and a person who is temporarily away. A
mover no longer lives at the sample address. On the other hand, a person is temporarily away if
the household is that person’s usual place of residence, according to the membership rules given
in Table 2-3, and specific living quarters are held for the person to which he or she is free to
return at any time. The following two examples may help to illustrate the distinction:


10
  A member of the armed forces who lives in a barracks is not eligible for an interview; a member of the armed
forces who lives elsewhere is eligible.


                                                    2-15
SIPP USERS’ GUIDE

!    A college student living on campus with a room held at home is still a household member at
     the sample address. In this case, the interviewer would try to interview that student or obtain
     a proxy interview with the household reference person. If the hypothetical college student
     originally lived in New York and, upon graduation, moved to Los Angeles to live on his or
     her own, the student would be considered to have moved as of the graduation date. The
     student’s new address in Los Angeles would become his or her new household, and, if the
     student was an original sample member, he or she would be treated in the same way as any
     other original sample member who moved to the new address.
!    If a household member is in the hospital following an operation but is expected to come
     home, that person is still a household member at the original address. If an individual
     interview is not feasible, the interviewer might do a proxy interview for that person. If,
     however, the person moved into a nursing home, he or she would not be eligible for a SIPP
     interview, whether individual or proxy. At each interview, the interviewer asks the status of
     any primary sample member who entered an institution between Wave 1 and the current
     wave. If the interviewer learns that the person has returned to the noninstitutionalized
     population, an interview is attempted.


Interview Procedures
At Wave 1, interviews are attempted for all members of selected housing units who are 15 years
of age or older.11 The Census Bureau prefers that all SIPP sample members 15 years of age or
older who are present at the time of the interview answer for themselves unless they are
physically or mentally unable to do so. For those who are absent or incapable of responding,
SIPP will accept a proxy interview, usually with another household respondent.

After Wave 1, the interviewer compiles (or updates) a separate household roster for each housing
unit, listing all people living or staying at the unit, including anyone who may have joined the
household, such as a new spouse or baby, and the dates they entered the household. The
interviewer then decides whether each person is a household member by using rules that
determine whether the person is a usual resident of the unit (Table 2-3).

Key to SIPP data collection is identification of a reference person for the household, an owner or
renter of record. The interviewer lists other people in the household according to their
relationship to the reference person.

Also noted are people who left the household and their dates of departure. If some—but not all—
sample members have moved since the last interview, the interviewer completes interviews at the
original address and also obtains the new address(es) of the individuals who moved. For those
remaining at the same address, the interviewer verifies that certain previously collected
information still applies, completes the questionnaire for each person 15 years of age or older,

11
   Detailed information about interview procedures is available from the Census Bureau in the SIPP interviewer's
instruction manual (U.S. Census Bureau, 1993).


                                                     2-16
                                    SIPP SAMPLE DESIGN AND INTERVIEW PROCEDURES

and collects certain information for children under age 15. Information is also collected for all
new household members. Movers are interviewed at their new addresses, along with other
household members they are living or staying with at the time.

Most interviews conducted through 1991 were in the form of personal visits. In 1992, SIPP
switched to maximum telephone interviewing to reduce costs. Wave 1, 2, and 6 interviews were
still conducted in person, but other interviews were conducted by telephone to the extent
possible. SIPP telephone interviews and personal visits are carried out by the same interviewer
interacting with the same respondents. Interviewers typically make phone calls from their homes.
For security and confidentiality reasons, they are not allowed to use cellular or cordless
telephones in the interviews. If a standard telephone is not available, the interviews must be
conducted face-to-face. Repeated failure to reach a respondent by telephone may also require an
in-person visit to the listed address.

When respondents are not able to furnish all requested information at the interview, interviewers
arrange to get the answers by telephone if the respondents are willing. Callbacks can also help
correct inconsistencies found during questionnaire editing. With the 1996 redesign, computer-
assisted interviewing (CAI) was begun. Thus, automatic consistency checks for selected data
occur during the interview. (For more on editing and imputation, see Chapter 4.)

The 1996 redesign included a change in the method of data collection. Prior to 1996,
interviewers used a paper questionnaire. Starting in 1996, however, interviewers began
conducting interviews with a laptop computer. Both the paper survey and the CAI instrument
have skip patterns that help the interviewer avoid asking irrelevant questions (see Chapter 3 for
more on skip patterns). In the paper survey, interviewers would encounter points at which they
had to look at previously given answers before deciding whether or not to ask certain questions.
With CAI, the instrument skips directly to the next applicable question.


Nonresponse
All surveys experience some degree of nonresponse. As discussed in Chapter 6, in a longitudinal
survey such as SIPP, as the number of waves increases, nonresponse may result in a
corresponding increase in bias. Since nonrespondents may differ from respondents in terms of
the variables collected in the survey, the occurrence of nonresponse gives rise to concerns about
bias in the survey results. Weighting adjustments are made in an attempt to reduce or eliminate
bias (Chapter 8), but concerns about nonresponse bias remain.

The rate of sample loss12 in SIPP generally declines from one wave to the next. The total number
of sample members lost, also known as total sample attrition, always increases over time.
Wave 1 nonresponse rates for SIPP have been about 7.7 percent.13 There is usually a sizable
12
   The accumulation of cases that are no longer being interviewed because of as yet unrecovered refusals or as yet
unfound movers.
13
   Nonresponse rates have not been stable, ranging from 6.70 percent for the 1984 through 1990 Panels to 8.48
percent for the 1991 through 1996 Panels.


                                                      2-17
SIPP USERS’ GUIDE

sample loss at Wave 2, with a lower rate of additional attrition occurring at each subsequent
wave. Prior to the 1992 Panel, SIPP lost roughly 20 percent of the original sample by the panel’s
completion. The sample loss rate for the 1996 Panel was 35.5 percent by the end of the 12th, or
final, wave. Chapter 6 in this volume and the SIPP Quality Profile provide more detailed
discussions of the implications of nonresponse for data quality. SIPP deals with the various types
of nonresponse by weighting adjustments or imputation (Chapters 8 and 4). Table 2-5 shows
cumulative loss rates for two types of nonresponse, discussed below.

The Census Bureau distinguishes between household and person nonresponse. Household
nonresponse occurs either when the interviewer cannot locate the household or the when
interviewer locates the household but cannot interview any adult household members. Person-
level nonresponse occurs when at least one person in the household is interviewed and at least
one other person is not—usually because that person refuses to answer the questions, or is
unavailable and no proxy is taken. The Census Bureau categorizes household nonresponse as
Types A and D (detailed definitions and discussion of rates follow),14 and person-level
nonresponse as Type Z.


Household Nonresponse

Type A household nonresponse occurs when the interviewer finds the household’s address, but
obtains no interviews. Those households contain people eligible for SIPP interviews, but every
eligible member of the household is a noninterview. Examples of Type A nonresponse include
the following:

!    The interviewer finds no one at home despite repeated visits.
!    All eligible household members are away during the entire interview period (e.g., an
     extended vacation).
!    Household members refuse to participate in the survey.
!    The interviewer cannot reach the housing unit because of impassable roads, such as from a
     natural disaster.
!    Interviews cannot be taken because of serious illness or death in the household.

When this type of household nonresponse occurs in Wave 1, SIPP makes no attempt to interview
the household members at subsequent waves. For Type A nonresponse that occurs in subsequent
waves, however, interviewers try to obtain interviews on the following wave. New Type A
noninterviews represent the first time a Type A household nonresponse occurred. Old Type A

14
  The Census Bureau recognizes two other types of household noninterviews. Type B occurs in Wave 1 when the
address unit is vacant or in some way unfit for residence; in subsequent waves, Type B occurs when people enter
institutions. Type C occurs in Wave 1 when the housing unit has been demolished or converted to some other use; in
subsequent waves, Type C occurs when all sample members in a household are outside the scope of the survey, e.g.,
deceased, living abroad, or living in armed forces barracks.


                                                      2-18
                                      Table 2-5. Household Noninterview and Sample Loss Rates: 1990–1996 Panels


       Wave              1990 Panel                   1991 Panel                   1992 Panel                    1993 Panel                   1996 Panel
                 Type      Type               Type      Type                Type      Type               Type      Type                Type      Type
                  A         D         Loss     A         D         Loss      A         D        Loss      A         D         Loss      A         D          Loss


                                                                                                                                                                    SIPP SAMPLE DESIGN AND INTERVIEW PROCEDURES
                                                                                                                                                                    SIPP SAMPLE DESIGN AND INTERVIEW PROCEDURES
                                                                                                                                                                    SIPP SAMPLE DESIGN AND INTERVIEW PROCEDURES
        1          7.3       —         7.3      8.4       —         8.4      9.3        —        9.3       8.9       —         8.9      8.4        —          8.4
        2        10.9        1.5      12.6     12.3       1.5      13.9     12.8       1.7      14.6      12.4       1.7      14.2     13.1       1.3        14.5
        3        11.5        2.6      14.4     13.1       2.7      16.1     13.1       2.8      16.4      12.9       2.9      16.2     15.6       1.9        17.8
        4        12.5        3.4      16.5     13.6       3.6      17.7     13.8       3.6      18.0      13.9       3.8      18.2     17.6       3.1        20.9
2-19


        5        13.6        4.6      18.8     14.5       4.2      19.3     14.9       4.7      20.3      14.9       4.7      20.2     20.4       3.8        24.6
        6        14.1        5.3      20.2     14.4       5.1      20.3     15.3       5.4      21.6      15.9       5.5      22.2     22.2       4.4        27.4
        7        14.3        5.9      21.1     14.7       5.6      21.0     16.0       5.9      23.0      17.2       6.2      24.3     23.8       4.8        29.9
        8        14.4        5.9      21.3     14.5       5.9      21.4     16.9       6.7      24.7      17.5       6.9      25.5     24.2       5.4        31.3
        9          —         —         —         —         —        —       17.7       7.3      26.2      18.2       7.5      26.9     25.0       5.6        32.8
       10          —         —         —         —         —        —       17.5       7.6      26.6        —        —         —       26.1       6.0        34.0
       11          —         —         —         —         —        —         —         —         —         —        —         —       25.5       6.2        35.1
       12          —         —         —         —         —        —         —         —         —         —        —         —         —        6.2        35.5
       Note: The sample loss rate is the cumulative noninterview rate adjusted for unobserved growth in the Type A noninterview units (created by splits).
       Source: SIPP Quality Profile, 3rd Ed. (U.S. Census Bureau, 1998a).
SIPP USERS’ GUIDE

nonresponse represents unsuccessful attempts to convert a Type A noninterview from the
previous wave. Two consecutive Type A noninterviews render the case ineligible for interviews
at the following wave.15

Type D household nonresponse concerns original sample members who move to an unknown or
uninterviewable address; it applies only to Wave 2 and beyond. Those noninterviews occur when
a household or some members of a household are living at an unknown new address or at an
address located more than 100 miles from a SIPP sample area and cannot be contacted by
telephone.16 For the 1996 Panel, Type D noninterviews are attempted three times before they are
dropped.


Person Nonresponse

There are two forms of person-level, or Type Z, nonresponse. The first applies to those instances
in which a sample person was in the household during part (or all) of the reference period and
was part of the household on the date of the interview but refused to answer, or was not available
for the interview and a proxy interview was not obtained. The second form of Type Z
noninterview occurs when a person was part of the household during part of the 4-month
reference period but then moved and was no longer a household member on the date of the
interview.17 While household nonresponse is usually handled by weighting adjustments, Type Z
cases are handled by imputation (i.e., they are matched to donors, and data from the donor case
are substituted for the missing interview—see discussion of imputation and weighting in
Chapters 4 and 8). Nearly half of SIPP Type Z nonrespondents are not interviewed at any of the
waves.


Item Nonresponse

Item nonresponse is an additional source of missing data; it occurs when a respondent does not
answer one or more questions, even though most of the questionnaire is completed. Respondents
might refuse to answer a particular question or set of questions. Sometimes, item nonresponse


15
   For each wave, the rate of Type A nonresponse is calculated by adding the number of Type A noninterviews for
the wave to the number of Type A noninterviews dropped from the sample in prior waves and dividing that sum by
the total of the number of interviewed households plus all Type A and Type D noninterviews.
16
   For each wave, the rate of Type D nonresponse is calculated by adding the number of Type D noninterviews for
the wave to the number of Type D noninterviews dropped from the sample in prior waves, and dividing that sum by
the total of the number of interviewed households plus all Type A and Type D noninterviews.
17
   If the person was an original sample member, information will be taken for the portion of the reference period in
which he or she was still at the address, and an effort will be made to locate the person. If the person was not an
original sample member, information will be taken for the portion of the reference period in which he or she was
still at the address, after which the person will not be pursued.


                                                      2-20
                                      SIPP SAMPLE DESIGN AND INTERVIEW PROCEDURES

occurs when respondents do not have the information requested.18 Although interviewers are
trained to attempt to persuade respondents to answer all applicable questions, and will call back
if a respondent can provide data at a later time, those efforts are not always successful. Item
nonresponse can also result from the postinterview data editing process when respondents
provide inconsistent information or when an interviewer incorrectly records a response. In many
cases, the Census Bureau handles item nonresponse by imputation, that is, by assigning values
for the missing items (Chapter 4).


18
   The information provided may also be inconsistent with edit specifications, and the response is thus deleted
during the processing stage. Or, interviewers may forget to ask for the information or record it incorrectly, resulting
in an edit failure. See Chapter 4 on editing and imputation.


                                                        2-21
3. Survey Content
This chapter provides analysts using the Survey of Income and Program Participation (SIPP)
with an overview of the survey content. SIPP is a longitudinal survey that collects information on
topics such as poverty, income, employment, and health insurance coverage. SIPP core content
covers demographic characteristics, work experience, earnings, program participation, transfer
income, and asset income. Each interview wave contains additional topical content, including
one or more topical modules, allowing the Census Bureau to address a range of subjects.1


The SIPP Interview
With the 1996 Panel, computer-assisted interviewing (CAI) was introduced. SIPP interviewers
began using a laptop computer to collect survey data.2 CAI presents a number of advantages over
interviewing with a paper instrument, the method used in previous panels (Chapter 2). Survey
elements appear seamless to both the interviewer and the respondent. In addition, the CAI
instrument makes certain decisions about which questions to ask, whom to ask, and so forth, that
were once left to the discretion of the interviewer. CAI also allows much of the core content
from prior waves to be referenced in each interview. The CAI instrument uses responses and
complicated logic from one part of the interview in subsequent parts of the interview, which
permits checking for consistency and accuracy in the data while the interviewer is still in contact
with the household.

This chapter will associate the word core with items in the survey that remain constant from one
wave to the next, and the word topical with items that do not appear in every wave. For both the
CAI instrument and the pre-1996 paper survey, data gathered every time the survey is conducted
are referred to as core content. The core questionnaire collects critical labor force, income, and
program participation data and is repeated at each interview. Questions asked periodically and
targeted to specific topics outside the range of the core content provide topical content and are
referred to as topical modules.

Cooperative, available respondents 15 years of age and older answer questions for themselves, to
the extent possible. While questionnaires are not completed for household members under age
15, information is collected about them so that household members under age 15 are fully
represented in the SIPP sample. When necessary, information in the CAI instrument is used to
determine the next best person in the household with whom a dependent or proxy interview
should be conducted; that is often, but not always, the reference person (Chapter 2).
1
  Analysts should consult the actual survey instrument for answers to specific questions about the ordering and
wording of survey items. The technical documentation can be ordered separately (Chapter 5). The SIPP Interviewer
Procedures Manual also can be ordered from the Census Bureau.
2
  Although all interviews were conducted using an automated survey instrument residing on a laptop, not all
interviews were done in person. In some cases, interviews were conducted by phone from the interviewer’s home.


                                                     3-1
SIPP USERS’ GUIDE

Skip patterns within SIPP control which questions are asked of each respondent. Skip patterns
tailor the questions to the circumstances of the respondent and bypass irrelevant questions. For
example, if a respondent has already said that he or she did not work during the reference period,
the skip pattern will prevent the interviewer from asking the person what kind of job was held
during that time. The CAI instrument automatically calls up the next relevant question, making
the skip patterns transparent to both interviewers and respondents. Before the introduction of
CAI, interviewers followed instructions on the paper survey in order to skip inappropriate
questions. Figure 3-1 illustrates the way in which skip patterns worked in the paper survey. Since
CAI handles skip patterns from “behind the scenes,” Figure 3-1 might also be viewed as showing
what is invisible in CAI.

                                      Figure 3-1. Skip Pattern Example

    7c. Could . . . have taken a job during those weeks if                    __ Yes – Skip to 7e
    one had been offered?
                                                                              __ No

    7d. What was the main reason . . . could not take a                       __ Already had a job
    job during those weeks?
                                                                              __ Temporary illness
    Mark (x) only one.
                                                                              __ School

                                                                              __ Other (Specify) _____
[Notes to interviewers are italicized; respondent’s name is filled in; and statements read to respondents are in bold.]


Core Content
Core questions are typically asked at the start of the interview. At the beginning of each
household visit, the Census Bureau interviewer completes or updates a roster listing all
household members, verifies basic demographic information about each person, and checks
certain facts about the household. The CAI instrument performs “behind the scenes” case
management functions at the same time. Prior to the advent of CAI, that information was
contained on the control card, which provided a mechanism for carrying information forward
from one wave to the next for each sample member. Core questions covering key areas of SIPP
follow the initial questions. For the most part, the 1996 Panel and prior panels cover the same
content; however, the organization of the content within the 1996 CAI instrument is somewhat
different.


                                                         3-2
                                                                          SURVEY CONTENT


Core Content for 1996 and Subsequent Panels

SIPP core content covers a variety of topics, including labor force status and employment,
earnings, business ownership, assets, income, program participation, child support collection,
health insurance, and education, among others. While CAI allows the SIPP interview to proceed
seamlessly, analysts will perceive distinct sections within the core data.


Employment and Earnings

The first group of survey questions addresses employment and earnings. This section collects
information about the respondent’s labor force status for each week of the reference period;
identifies characteristics of employers, self-employment, and businesses the respondent might
own; and gathers data about earnings, whether from a job or from self-employment. Respondents
are asked about their labor force status and any unemployment compensation for a time period
covering the beginning of the 4-month reference period up through the date of the interview. The
type of work performed and dates of employment are also noted. The interviewer asks
respondents who own businesses whether they are active in its management, own it as an
investment, or are involved in some combination thereof. The survey also collects data on time
spent looking for work, moonlighting, and the current employment situation for up to two jobs
and two businesses. Employment status is derived from information about specific jobs.

The flow of the survey is such that questions about employment and job characteristics are asked
first, with amounts collected separately. Probes ensure that amounts are reasonable and that gross
amounts are obtained. Respondents are asked to refer to records whenever possible.


Program, General, and Asset Income

These questions focus on income from a source other than the respondent’s work situation. Many
of the questions address income or benefits from programs such as Social Security or Food
Stamps (and in 1996 have been adapted to capture postreform welfare benefits); the survey also
collects information about retirement, disability and survivors’ income, unemployment insurance
and workers’ compensation as well as severance pay, lump-sum payments from pension or
retirement plans, child support, and alimony payments. A set of general income questions takes
information collected previously and obtains more details about who is covered, how payments
are received, reasons for receiving government transfer income, and other data having to do with
program participation. SIPP also collects information on amounts of “roll over” retirement
accounts.

To obtain information on asset income, interviewers ask respondents which assets they own,
prompting the respondent from a list including U.S. savings bonds, 401(k) plans, stocks, rental
property, and the like. Respondents are also asked if they have received any lump-sum or regular
payments from an IRA, Keogh, 401(k), or thrift plan. Other questions address income received
from assets owned, other than retirement accounts. Income for some assets is collected and


                                               3-3
SIPP USERS’ GUIDE

recorded within preset ranges. Most asset income is recorded in exact amounts whenever
possible, however. The issue of joint ownership of assets is also addressed.

Additional Questions

SIPP core content also includes small sections that deal with health insurance ownership and
coverage (Medicare coverage, Medicaid, private and employer-provided health insurance, and
reasons for noncoverage), education (educational attainment, adult school enrollment, and
educational assistance), and energy assistance and school lunch program participation.

Table 3-1 lists possible income and benefit sources, along with some special indicators.

Core Content for Pre-1996 Panels

Core content in the paper surveys used before the 1996 Panel was structured differently, in four
very distinct sections that are described below.


Labor Force and Recipiency

The first set of survey questions addressed the respondent’s labor force status, sources of any
income received, participation in government transfer programs, and health insurance coverage
during the 4-month reference period. Respondents were asked about any employment during
each of the 4 months prior to the interview month, although detailed information about their
specific jobs was not collected here. Respondents who were employed were asked about the
number of hours they worked during a typical week and the number of weeks they worked. For
those who did not work, SIPP interviewers asked if they were on layoff or had looked for a job.
These survey questions also elicited whether any income had been received from a list of
potential sources, including government programs. Respondents were asked about their
ownership of assets, although this section of the interview did not include questions about
amounts earned in those assets.


Earnings and Employment

This section of the SIPP core asked respondents who reported any employment during the 4-
month reference period covered by the interview a more detailed series of questions about the
jobs they held. Interviewers collected information for up to two different “wage and salary” jobs
in each wave. For each job, data were collected on occupation, industry, and work activities and
duties. Several questions aimed to determine the total pay from each job for each month of the
reference period. Similar information was collected for up to two different “self-employment”
jobs in each wave.


                                               3-4
                                                                                         SURVEY CONTENT

                             Table 3-1. Types of Income Recorded in SIPP

Wage or Salary Income                                        Asset Income (General Amounts Type 2)
Income from job 1                                            Regular/passbook savings accounts in a bank, savings
Income from job 2                                               and loan, or credit union
Income from business 1                                       Money market deposit accounts
Income from business 2                                       Certificates of Deposit or other savings certificates
                                                             NOW, Super NOW, or other interest-earning checking
Program and Miscellaneous Income (General                       accounts
    Amounts Type 1)                                          Money market funds
Social Security                                              U.S. government securities
U.S. Government Railroad Retirement payments                 U.S. Government Savings Bonds (E, EE)
Federal Supplemental Security Income                         Municipal or corporate bonds
State Supplemental Security Income                           IRA or Keogh account
State unemployment compensation                              Other interest-earning assets
Supplemental Unemployment Benefits                           Stocks or mutual fund shares
Other unemployment compensation                              Rental property
Veterans compensation or pensions                            Mortgages from which payments are received
Black Lung payments                                          Royalties
Worker’s Compensation                                        Other financial investments not already mentioned
State temporary sickness or disability benefits
Employer or union temporary sickness benefits                Noncash Income (other than WIC and Food Stamps)
Employer disability payments                                 Public housing occupancy
Severance pay                                                Rent subsidies
Payments from a sickness, accident, or disability            Energy assistance
    insurance policy purchased on your own                   Subsidized school lunches or breakfasts
Aid to Families with Dependent Children/Temporary
    Assistance for Needy Families                            Special Indicators
General Assistance or General Relief                         Worked
Foster child care payments                                   Disabled
Other welfare                                                VA disability rating of 100%
Women, Infants and Children nutrition programs               VA disability of less than 100%
Pass through child support payments                          Medicare
Food Stamps                                                  Medicaid
Child support payments
Alimony payments                                             Educational Assistance
Pension from company or union                                College work study
Federal Civil Service or other federal civilian employee     Health or Nursing Grant, ROTC, NSF Grant
    pensions                                                 Stafford Grant
U.S. military retirement pay                                 Perkins Grant
National Guard or Reserve Forces retirement                  SLS Grant
State government pensions                                    Grant, scholarship, tuition reimbursement from school
Local government pensions                                        attended
Income—paid-up life insurance policies or annuities          Teaching or research assistantship from school attended
Estates and trusts                                           Grant or scholarship from the state, such as SSIGP,
Other payments for retirement, disability, or survivor           Douglas scholarships
GI Bill/VEAP education benefits                              Grant or scholarship from some other Source, such as
Other VA educational assistance                                  foundation, corporation, community group, National
Draw from IRA/Keogh 401(k) or thrift plan                        Merit scholarships
Income assistance from a charitable group                    PELL Grant
Money from relatives or friends                              Supplemental Educational Opportunity Grants
Lump-sum payments                                            National Direct Student Loan
Income from roomers or boarders                              Guaranteed Student Loan
National Guard or Reserve pay                                JTPA training
Incidental or casual earnings                                Employer assistance
Other cash income not included elsewhere                     Fellowship/scholarship
                                                             Other financial aid


                                                           3-5
SIPP USERS’ GUIDE

Amounts of Income Received

The third group of core questions addressed the amounts of income or benefits received from
sources other than earnings.3 Detailed information was also collected about participation in
government transfer programs. For each nongovernment, nonasset source reported (e.g., alimony
payments), respondents were asked the amount of income received during each of the prior 4
months. If benefits were received from government programs, respondents were asked the reason
for program participation and who within the household was covered. Questions about asset
income, from sources such as interest, dividends, rents, and royalties, sought only the total
amount for the 4-month reference period. Examples of assets include money market funds,
stocks, rental property, and other financial investments. An example of income earned from an
asset would be the interest from a savings account.


Program Questions

The final section of the SIPP core included questions about participation in programs that
provide subsidized housing, energy assistance, and school meal programs.


Topical Content
Topical questions are those that are not repeated in each wave. These questions usually appear in
separate topical modules that follow the core questions. Topical modules are designed to gather
specific information on a wide variety of subjects. They provide a broader picture of the types of
individuals who are responding to the survey and give SIPP some flexibility in collecting data on
emerging issues. Some topical modules are included in each panel but, unlike the core content,
are not in each wave. The frequency and timing of these modules may vary. For example, the
personal history topical modules are always administered once, in Waves 1 and 2. Other topical
modules are asked multiple times within the same panel; the Assets and Liabilities module, for
example, is included four times within the 1996 Panel.

In some instances, the interview flows more smoothly if topical questions are placed with core
questions that relate to the same topic. For example, topical questions on asset balances are
divided between items included in the core questionnaire and items included in a separate topical
module. SIPP asks questions about ownership and an income amount in the core. Questions
relating to asset balances appear in the asset topical module. Similarly, home-based-employment
and size-of-firm data collected in the 1992 and 1993 Panels (Waves 6 and 3, respectively) are
incorporated into the core questionnaire. The term topical module, therefore, actually refers to all
topical items of the same theme, instead of those that are grouped together into a distinct module,
because the frequency with which the item appears is more important than its location.


3
  As with all of SIPP, respondents include all people 15 years old and over. When children under 15 have their own
income, it is recorded as having been received by an adult on their behalf.


                                                      3-6
                                                                                      SURVEY CONTENT

Reference periods for items in topical modules vary widely, ranging from the respondent’s status
at the time of the interview to the respondent’s experience over his or her entire life. When
working with data from the SIPP topical modules, analysts should check question wording
concepts carefully to ascertain the reference period. They should also check the universe for each
question, because topical modules are not uniformly asked of all respondents. For example, only
people 25 years of age or older are asked topical module questions about their retirement and
pension accounts. Questions on shelter costs and energy usage are asked only of the reference
person. In other modules, a screening question will determine who is and is not asked the
remainder of the module—in the case of the Work Schedule module, for example, only those
who worked during the previous month answer the entire set of questions.

The relationship between topical module titles and content is not perfectly consistent. Over the
history of SIPP, there have been situations in which either the topical module content changed
with no change in title or the topical module title changed with little change in content. In a few
situations, content has “floated” from one topical module to another. And sometimes there has
been significant overlap in content between two topical modules with different titles.

The actual questions are provided with the microdata technical documentation. Specific topical
modules are discussed below, with the panels and waves listed in brackets (e.g., [93-3, 96-6] for
a module asked in the third wave of the 1993 Panel and the sixth wave of the 1996 Panel).
Chapter 5 lists topical modules and the panels and waves in which they were included in the
survey. Table 3-2 groups topical modules thematically (modules may appear in more than one
category).

                        Table 3-2. Topical Modules Grouped Thematically

Category                Topical Module
Health, Disability, &   Adult Well-Being; Children’s Well-Being; Functional Limitations and Disability; Health
Physical Well-Being     and Disability; Health Status and Utilization of Health Care Services; Long-Term Care;
                        Medical Expenses and Work Disability; Work Disability History
Financial               Annual Income and Retirement Accounts; Assets and Liabilities; Real Estate Property and
                        Vehicles; Recipiency History; Retirement Expectations and Pension Plan Coverage;
                        School Enrollment and Financing; Selected Financial Assets; Shelter Costs and Energy
                        Usage; Support for Nonhousehold Members; Taxes
Child Care &            Child Care; Child Support Agreements; Child Support Paid; Support for Nonhousehold
Financial Support       Members
Education &             Education and Training History; Employment History; Job Offers; School Enrollment and
Employment              Financing; Work-Related Expenses; Work Schedule
Family & Household      Extended Measures of Well-Being; Family Background; Fertility History; Household
Characteristics &       Relationships; Marital History
Living Conditions
Personal History        Education and Training History; Employment History; Fertility History; Marital History;
                        Migration History; Recipiency History; Work Disability History
Welfare Reform          Eligibility for and Recipiency of Public Assistance; Benefits; Job Search and Training
                        Assistance; Job Subsidies; Transportation Assistance; Health Care; Food Assistance;
                        Electronic Transfer of Benefits; Denial of Benefits


                                                     3-7
SIPP USERS’ GUIDE


Specific Topical Modules

Adult Well-Being. Asks the reference person about consumer durables, living conditions,
crime, neighborhood conditions, community services, basic needs, and food adequacy. This
topical module assesses the standard of living of SIPP respondents. It is similar to Extended
Measures of Well-Being and incorporates Basic Needs information that was asked as a separate
module in 93-9. [93-9, 96-8]

Annual Earnings and Benefits. Includes questions that ask people about their calendar-year
wages and salaries and income from their own businesses, as well as the receipt of certain
employer-provided benefits not covered elsewhere in SIPP, such as the use of a company car or
truck, an expense account, or the provision of free meals and lodging. In addition, a series of
questions is administered about reasons for leaving for those persons who left a job during the
calendar year. Questions about calendar-year earnings, taxes, health and life insurance
deductions, and retirement contributions are designed to obtain the most accurate data available,
and respondents are encouraged to refer to W-2 forms and other records. This module is
administered twice per panel. [84-6]

Annual Income and Retirement Accounts. Obtains respondent estimates of calendar-year
business income and respondents’ personal retirement plans. The module asks about businesses
owned by respondents, gross income and expenses to such businesses, net income to such
businesses, retirement accounts, including IRA, Keogh, and 401(k), and respondent participation
in those retirement plans. [84-9, 85-5, 85-8, 86-5, 87-5, 88-5, 90-5, 90-8, 91-5, 91-8, 92-5, 92-8
93-5, 93-8, 96-4, 96-7, 96-10]

Assets, Liabilities, and Eligibility. Collects information about the value of assets and debt
on assets and expands on data gathered in the core questions. The intent of this topical module is
to derive a comprehensive measure of household net worth and to collect information used to
determine eligibility for federal assistance programs. To that end, the topical module includes
selected additional questions needed to determine program eligibility. Some of the assets
included are savings accounts, stocks, mutual funds, and bonds. Data on unsecured liabilities
such as loans, credit cards, and medical bills are also gathered. Assets and liabilities that are held
jointly are identified to prevent double-counting. The 1996 version of this module has seven
sections: value of business; interest earning accounts; stocks and mutual funds; mortgages; other
assets; assets and liabilities; and real estate, shelter costs, dependent care, and vehicle ownership.
(Also asked as Assets and Liabilities.) [84-4, 84-7, 85-3, 85-7, 86-4, 86-7, 87-4, 90-4, 91-7, 92-4,
93-7, 96-3, 96-6, 96-9, 96-12]

Child Care. Collects information about all child care arrangements, for all children under 15,
from mothers, single fathers, or guardians, regardless of labor force status. Those with children
under age 15 are asked about the type of child care arrangements, who provides the care, the
number of hours of care per week, where the care is provided, and the cost of the care. The
module asks whether a relative or nonrelative cared for the child, and if the child was in school.
Before the 1993 Panel, the module collected information about only one to two child care
arrangements from mothers, single fathers, or guardians who were either working, in school, or


                                                 3-8
                                                                                       SURVEY CONTENT

looking for a job during the 4-month reference period. [84-5, 85-6, 86-3, 86-6, 87-3, 87-6, 88-3,
88-6, 89-3, 90-3, 91-3, 92-6, 92-9, 93-3, 93-6, 96-4, 96-10]

Child Support Agreements. Helps determine whether money received as child support
affects participation in government programs and whether lack of support from one parent causes
the other parent to need government assistance. The module collects information about
characteristics of child support agreements, the annual amount and frequency of payments, and
provisions for health care costs. Additional questions cover custodial arrangements, contact with
public agencies for assistance in collection of child support, frequency of contact with the absent
parent, current place of residence of the absent parent, and reasons for nonaward of child
support. Questions about paternity establishment status are also asked about children of women
with nonwritten agreements and all never married women. [85-6, 86-3, 86-6, 87-3, 87-6, 88-3,
88-6, 89-3, 90-3, 90-6, 91-3, 92-6, 92-9, 93-3, 93-6, 93-9, 96-5, 96-11]

Child Support Paid. Serves as a counterpart to the Child Support Agreements module. It
seeks information about support for children of the respondent who are under 21 years old and
who live with another parent or guardian at any time during the module’s reference period of 4
months. [96-3, 96-6, 96-9, 96-12]

Children’s Well-Being. Asks the designated parent or guardian about the health of children in
the household, care of the child by nonfamily members, activities the family does with the
children (such as reading and outings), lessons and activities outside of school, rules for
children’s TV viewing, and the respondent’s opinion about the quality of the neighborhood. The
module obtains information about children in three age groups—under 6 years old, ages 6–11,
and ages 12–17—for as many as seven children in each category. Certain questions target fathers
or stepfathers who are not designated parents; other questions address whether the child attends a
public or private school. Content of this module varies across different panels and waves;
analysts should check the documentation for exact content. [92-9, 93-6, 93-9, 96-6, 96-11]

Education and Training History. Collects information about respondent’s highest level of
school completed or degree received, courses or programs studied, and dates of receipt of high
school and postsecondary degrees or diplomas. The module determines if the respondent
attended a public or a private high school. Job-related-training questions address training
designed to help find or develop skills for a new job as well as to improve skills at the current or
most recent job. People 15 years of age and older are asked whether they have received job
training; if they have, they are asked about the duration of the training, how it was used, how it
was paid for, and if it was federally sponsored.4 (Variations are also asked as Education and
Work History [84-3] and Education and Training [84-6].) [86-2, 87-2, 88-2, 89-2, 90-2, 91-2, 92-
2, 93-2, 96-2]

Employer-Provided Health Benefits. Collects data on the availability of health care
benefits from employers and the demographics of workers with and without employer-provided
health coverage. The module asks whether the plan restricts the respondent to specified doctors,

4
  All of the “History” topical modules are designed to collect information about the respondent’s experiences prior
to the beginning of the SIPP panel. This information is most useful in combination with the more current
longitudinal information collected during the panel.


                                                       3-9
SIPP USERS’ GUIDE

if family members are covered, and whether any family members have pre-existing conditions
not covered by the plan. The module also asks about long-term health care options. [96-5]

Employment History. Identifies patterns of employment, length of employment at certain
jobs, and reasons for any periods of unemployment subsequent to the respondent’s first job.
Beginning with the 1996 Panel, specific questions that address type of work done, job duties, and
the industry in which the respondent works were moved into the core content; previously, such
questions had been part of this module. [86-2, 87-2, 88-2, 89-2, 90-2, 91-2, 92-1, 93-1, 96-1]

Extended Measures of Well-Being. Assesses the standard of living of SIPP respondents.
Three types of questions address the objective physical conditions in which the respondents live,
respondents’ ability to meet specified basic needs during the reference period, and respondents’
subjective assessments of the quality of their living situations. Included under the first category
are questions about the presence and condition of specified consumer durable goods in the home
(e.g., clothes washers, refrigerators, air conditioners) and the physical condition of the home
itself (e.g., condition of the roof and walls, state of the home’s electrical wiring and plumbing).
Another series of questions concerns conditions in the respondent’s neighborhood, such as
safety, cleanliness, and traffic. The second group of questions concerns whether members of the
respondent’s household had sufficient food to eat during the 4-month reference period and
whether they were able to pay rent and other bills or to obtain medical care when needed.
Respondents are also asked about the sources of help available when the respondent is in need
(e.g., family, friends, or community). Finally, respondents rate their satisfaction with the quality
of different aspects of their living conditions. Included are items such as the quality of the
furnishings, convenience of the home to shopping, and the general state of repair of their home.
(Some of those questions have been asked as a Basic Needs module [93-9].) [91-6, 92-3]

Family Background. Asked of people between ages 25 and 64. Obtains family characteristics
at the time of the respondent’s 16th birthday, including how many brothers and sisters the person
had, with whom the person lived, the highest grade of school completed by the parents, and the
occupations of the parents. [86-2, 87-2, 88-2]

Fertility History. Asked only of females 15 years of age and older and males 18 and older.
Men are asked about the number of children they have fathered, and women are asked about
their birth histories. Interviewers ask women who have had children when their first and last
children were born, along with questions about their employment status during pregnancy and
prior to the birth of their first child, circumstances of any absence from work before and after the
first birth, and the maternity leave policies of their employers. Postbirth employment is also
covered. [84-8, 85-4, 86-2, 87-2, 88-2, 89-2, 90-2, 91-2, 92-2, 93-2, 96-2]

Functional Limitations and Disability. Provides data that can be used to evaluate links
between types of disability, the family financial situation, and program participation. This
module is asked in three variations: overall, adult, and children. Adults are asked the standard
Activities of Daily Living (ADL) and Instrumental Activities of Daily Living (IADL) battery of
questions. Questions address physical and mental conditions affecting the respondent, the use of
mobility aids, vision and hearing impairments, speech difficulties, lifting and aerobic difficulties,
and the ability to function independently within the home. For those under age 22, the questions


                                               3-10
                                                                                        SURVEY CONTENT

are modified, referring to age-appropriate activities (e.g., questions about work activities are
recast to ask about analogous school activities). Questions about children also address the use of
special education services. For those under age 15, the interviewer asks the questions of the
designated parent or guardian. [90-3, 90-6, 91-3, 92-6, 93-3 for overall module; 92-9, 93-6, 96-5,
96-11 for separate children and adults modules]

Health and Disability. Gathers data for all sample members about their general health,
functional limitations (using the standard ADL battery of questions), work disability, and the
need for personal assistance. Respondents are asked about any hospital stays during the reference
period, other periods of illness, other health facilities used, and their health insurance coverage.
Information on children is collected from a designated parent or guardian. (Variations are also
asked as Functional Activities, Disability Status of Children, and Disability Questions.) [84-3 for
Health and Disability; 88-6, 89-3 for Functional Activities; 85-6, 86-3, 87-6, 88-3, 88-6, 89-3 for
Disability Status of Children; 96-4 for Disability Questions]

Health Status and Utilization of Health Care Services. Asks about hospital stays,
including any in psychiatric institutions; other illnesses or injuries that left the respondent
bedridden for at least most of 1 day; doctor visits and frequency of visits, dental visits and
frequency of visits; where the respondent seeks health advice (doctor’s office, clinic, hospital);
and health insurance coverage. (Also asked as Utilization of Health Care Services.) [85-6, 86-3,
87-6, 88-3, 88-6, 89-3, 90-3, 90-6, 91-3, 92-6, 92-9, 93-3, 93-6, 96-3, 96-6, 96-9, 96-12]

Home Health Care. Asks about the type and sources of help given to respondents who needed
help with their personal care, household activities, and basic errands because of a health
condition. Respondents are asked if caregivers were relatives or nonrelatives, and whether or not
the caregivers were household members. This module also asks about members of the household
who might have given such care, on a nonprofessional level, to a person outside the household.
Questions determine the relationship of the caregiver and recipient(s) and the kind of care given.
[88-6, 89-3]

Household Relationships. Collects information about relationships among household
members. The SIPP core questions gather extensive information about household composition
for each month of the panel. This information allows for the identification of families and
subfamilies and details each household member’s relationship to the household reference
person.5 As extensive as this information is, it does not cover the interrelationships of all
household members. For example, the SIPP core provides no information about the relationships
between members of two different unrelated (to the household reference person) subfamilies
residing in the same household. This topical module fills that gap by collecting complete
information about how each member of the household is related to every other member of the
household. Relationships are specified in detail; for example, a brother is a full brother, half
5
  The family is defined by the Census Bureau as two or more people who are living together and are related by
blood, marriage, or adoption. A primary family is the family containing the household reference person; an unrelated
subfamily is a family that does not contain the reference person or anyone related to the reference person. Related
subfamilies are families within the primary family. A daughter and husband living with the daughter’s parents would
constitute a related subfamily. The reference person is the person in whose name the home is owned or rented. If the
house is owned jointly by a married couple, either the husband or the wife may be listed as the reference person.


                                                      3-11
SIPP USERS’ GUIDE

brother, stepbrother, or adoptive brother. In-law relationships are also identified. [84-8, 85-4, 86-
2, 87-2, 88-2, 89-2, 90-2, 91-2, 92-2, 93-2, 96-2]

Housing Costs, Conditions, and Energy Usage. Collects information on mortgage
payments, real estate taxes, fire insurance, principal owned, when the mortgage was obtained,
and interest rates; rent; type of fuel used and heating facilities; appliances; and vehicles.6
Questions on value of home and automobile are used in conjunction with assets and liabilities
reported in the Assets and Liabilities Topical Module to calculate each individual’s net worth.
This topical module also helps to fulfill a need for information concerning energy usage that has
resulted from increased interest in recent years over the rising costs of energy and concerns about
conservation. The information can be used in analysis of the requirements of individuals and
households who participate in energy assistance programs. [84-4]

Job Offers. Asks about any job offers received by respondents who were looking for work or
who were on layoff during the reference period. If the respondent was offered a job and did not
accept it, questions probe the reason for rejecting the job and the amount of money that was
offered. [85-6, 86-3]

Long-Term Care. Focuses on health-related conditions that might cause a person to need help
around the home. Specific questions address the ability of people in the household to manage
their personal care, housework, meal preparation, and basic errands outside the home. The
module ascertains whether or not individuals providing such assistance are household members.
Additional questions ask about community services and the financial burden of acquiring
assistance. The module also asks about the activities of respondents who themselves provided
such assistance on a nonprofessional basis to individuals outside the household. (Also asked as
Home Health Care.) [85-6, 86-3, 87-6, 88-3, 88-6, 89-3]

Marital History. Asks questions of all respondents aged 15 and older who have ever been
married. The date of the present marriage is determined; for those married more than once, SIPP
records the dates of their first two marriages and their last marriage, if married more than twice.
If appropriate, respondents are asked when their previous marriages ended and whether they
were widowed or divorced at the end of their marriages. [84-8, 85-4, 86-2, 87-2, 88-2, 89-2, 90-
2, 91-2, 92-2, 93-2, 96-2]

Medical Expenses and Work Disability. Gathers data about out-of-pocket medical
expenses, health services, doctor visits, prescription drugs, insurance reimbursement, and health
and physical conditions that might affect the respondent’s ability to work. The reasons for and
length of any hospitalizations are determined, and respondents are asked about the types of
medical professionals who delivered care. Most questions apply to both children and adults.
(Also asked as Medical Expenses.) [87-7, 88-4, 89-4, 90-7, 91-4, 92-7, 93-4, 93-7, 96-3, 96-6,
96-9, 96-12]

Migration History. Asks respondents aged 15 and older where they were born, where they
have lived, and how long they have lived in those places. Respondents born in a foreign country
6
 Subsequent to the 1984 Panel, questions on energy usage were combined into a separate module. Vehicles and
housing values are retained together in a module entitled “Real Estate and Vehicles.”


                                                  3-12
                                                                           SURVEY CONTENT

are asked about their citizenship status and when they came to the United States to stay. [84-8,
85-4, 86-2, 87-2, 88-2, 89-2, 90-2, 91-2, 92-2, 93-2, 96-2]

Property Income and Taxes. Collects information on rental income received during the
calendar year and on interest earned and/or dividends from assets such as savings accounts,
money market deposit accounts, interest-earning checking accounts, bonds, or stocks. They are
also asked about federal and state income tax liabilities and certain other tax information such as
type of return, use of selected schedules (for example, Schedula A, Itemized Deductions;
Schedule B, Interest or Dividends; or Form 4835, Farm Rental Income), and number of
exemptions. The tax questions are asked in order to develop better estimates of the distribution of
after-tax income and to help build better microsimulation models of the tax and transfer system.
This module is administered twice per panel. [84-6]

Real Estate Property and Vehicles. Gathers information about housing tenure and
financing, other real estate ownership, and automobile ownership. Home owners are asked a
series of questions that allow the estimation of net real estate equity. Questions about vehicles
address ownership, type of vehicle (i.e., car, truck, motorcycle), value, and amount owed. Those
questions are also used in program eligibility simulations. (A variation of this module is asked as
Real Estate, Shelter Costs, Dependent Care, and Vehicles.) [84-7, 85-3, 85-7, 86-4, 86-7, 87-4,
87-7, 88-4, 90-4, 90-7, 91-4, 91-7, 92-4, 92-7, 93-4, 93-7]

Reasons for Not Working/Reservation Wage. Ascertains the reasons that persons are not
in the labor force and the conditions under which persons might want to join the labor force. The
reservation wage questions ask about the pay rate that a person would require in order to begin
working (Ryscabage, 1987). Questions are also asked about job search and, if people have been
offered but did not accept a job, the reason they refused it. This module was discontinued after
the 1985 Panel. [84-5]

Recipiency History. Obtains a profile of a respondent’s pattern of participation in certain
government programs prior to the beginning of the SIPP panel. Specific questions address the
first time a respondent participated in a particular program, the length of participation, and the
number of times the respondent has been in the program. [86-2, 87-2, 88-2, 89-2, 90-2, 91-2, 92-
1, 93-1, 96-1]

Retirement Expectations and Pension Plan Coverage. Obtains information about the
respondent’s pension plan coverage for the most important current job or business, and
information from persons currently receiving retirement benefits from a former job or business.
Respondents are asked about their coverage and vesting in pension plans, types of plans, the
reasons they are not included by or do not participate in plans, current contributions and amounts
of money in their accounts if applicable, and how the money in their own plans is invested. Other
questions concern loans from pension accounts and treatment of lump sums received from prior
job pension plans.

Respondents currently receiving pension income are asked about the types of pension they
receive, provisions for cost-of-living adjustments, and health benefits. Respondents are also
asked Industry and Occupation data about the job or business from which their pensions are


                                              3-13
SIPP USERS’ GUIDE

received. (Also asked as Pension Plan Coverage [84-7].) [84-4, 85-7, 86-4, 86-7, 87-4, 90-4, 91-
7, 92-4, 93-9, 96-7]

School Enrollment and Financing. Seeks information about basic educational attainment,
enrollment in public and private schools, and whether those in government programs differ from
others in terms of financing their education and their sources of educational assistance. Asked of
people aged 15 and older, the module includes questions to pinpoint the grade level of people
enrolled in a general, technical, or business school; their pattern of full- or part-time enrollment;
amount of tuition and fees; costs of room and board; and books and supplies. Specific sources of
educational assistance, such as the GI Bill or employer assistance, are also determined. (Also
asked as Education Financing and Enrollment.) [84-9, 85-5, 85-8, 86-5, 87-5, 88-5, 90-5, 90-8,
91-5, 91-8, 92-5, 92-8, 93-5, 93-8, 96-5]

Selected Financial Assets. Focuses on the value of such assets as savings bonds, checking
accounts, retirement accounts, life insurance, and the number of years respondents have held
certain assets. [87-7, 88-4, 90-7, 91-4, 92-7, 93-4]

Shelter Costs and Energy Usage. Collects information on rent or mortgages, real estate
taxes, and insurance; energy costs; and motor vehicles. The information is pertinent to the
determination of eligibility for a number of federal assistance programs. (Also asked as Housing
Costs, Conditions, and Energy Usage.) [84-4, 86-6, 87-3]

Support for Nonhousehold Members. Provides information about respondents’ routine
payments supporting people who are not current household members. Includes both child
support payments for own children under 21 years of age and payments made to (or for) people
who are not children of the respondents—for example, an elderly parent in a nursing home or an
adult child living away from home and in an entry-level job. Questions about child support
include number of children supported, type and year of agreement, annual amount and method of
payment, health care provisions and custodial arrangements, and amount of contact with the
absent children. Questions about support for other persons outside the household include their
relationship to the respondent, living arrangement, and annual amount of support paid. [84-5, 84-
8, 85-4, 85-6, 86-3, 86-6, 87-3, 87-6, 88-3, 88-6, 89-3, 90-3, 90-6, 91-3, 92-6, 92-9, 93-3, 93-6,
93-9, 96-5]

Taxes. Includes questions about exemptions, calendar-year wages and salaries, income from
businesses, itemized deductions, and earned income credits. Respondents are asked about federal
and state income tax liabilities, exemptions, amounts owed for federal and property taxes, and
amounts from a variety of tax schedules. To help ensure accuracy, interviewers encourage
respondents to refer to income tax returns and other records. Historically, this module has been
administered at least twice per panel, generally in the spring when respondents were likely to be
preparing their tax returns for the prior year. (Also asked as Earnings and Benefits, and Property
Income and Taxes.) [84-6, 84-9, 85-5, 85-8, 86-5, 87-5, 88-5, 90-5, 90-8, 91-5, 91-8, 92-5, 92-8,
93-5, 93-8, 96-4, 96-7, 96-10]


                                               3-14
                                                                           SURVEY CONTENT

Time Spent Outside Work Force. Collects information about work history and reasons for
not working. Asked of people 21 or older, this short module addresses up to four periods of 6
months or longer in which the respondent did not work at a paid job or business. [90-6]

Welfare History and Child Support. Collects information on how long individuals may
have received aid from specific welfare programs and on child support agreements and their
fulfillment. The data from the welfare history questions will be used to measure the extent to
which persons and households have been dependent upon government transfer programs in their
general finances and will be helpful in evaluating the effectiveness of the programs.

One series of questions in the module concerns the Food Stamp, AFDC/Temporary Assistance
for Needy Families (TANF), and SSI programs. Current recipients are asked how long they have
been receiving, or have been authorized to receive, these benefits. Recipients and nonrecipients
are asked whether they had at any previous time applied for benefits, whether they received
them, and, if so, when and for how long. This module was incorporated into a series of history
modules, collectively called the Personal History Topical Module, beginning with the 1986
Panel.

The Child Support Topical Module attempts to determine whether those entitled to receive child
support payments have in fact received them. The module asks whether the child support
agreement was court ordered or arranged otherwise and how the payments were to be made. It
also asks for the amount and regularity of payment and whether a child support enforcement
office has provided any help. [84-5]

Welfare Reform. Seeks information about eligibility for and recipiency of public assistance.
Specific questions address benefits, assistance that supports a respondent seeking work or
acquiring training, requirements for receiving benefits (such as job hunting, drug testing, etc.),
job subsidies, transportation assistance, health care, and food assistance. This module also
gathers information about electronic transfer of benefits and denial of benefits to the respondent.
[96-8]

Work Disability History. Asks a series of questions about chronic health conditions that may
affect the amount or type of work a respondent can do. Included are any such physical, mental,
or other health conditions that interfere with the respondent’s ability to work for at least 3
months. Questions are asked about when the limiting condition first became an issue, whether
the person was working at the time, whether the condition resulted from an accident or injury,
and if so, where the accident or injury occurred. Shorter-term conditions (including pregnancy)
are not included as limiting conditions. [86-2, 87-2, 88-2, 89-2, 90-2, 91-2, 92-2, 93-2, 96-2]

Work-Related Expenses. Asks about work-related expenses for each employer the respondent
had during the reference period. Questions address various costs of working, such as union dues,
licenses, special tools, and uniforms. Mode of transportation and mileage driven to and from work
are determined, along with any parking or mass transit fees. (Also asked as Work-Related
Expenses and Child Support Paid.) [84-5, 84-8, 85-4, 86-6, 87-3, 96-3, 96-6, 96-9, 96-12]


                                              3-15
SIPP USERS’ GUIDE

Work Schedule. Collects information about the number of hours and days worked during a
typical week in the fourth reference month. Questions about whether or not the respondent
worked only at home on any days are included. [87-6, 88-3, 88-6, 89-3, 90-3, 91-3, 92-6, 92-9,
93-3, 93-6, 93-9, 96-4, 96-10]


                                            3-16
4. Data Editing and Imputation
This chapter describes the data editing and imputation procedures applied to data from the
Survey of Income and Program Participation (SIPP) after completion of the interviews. Three
different approaches are used for dealing with missing data in SIPP:

!   Weighting adjustments are used for some types of noninterviews;
!   Data editing (also referred to as logical imputation) is used for some types of item
    nonresponse; and
!   Statistical (or stochastic) imputation is used for some types of unit nonresponse and some
    types of item nonresponse.
Weighting is discussed in Chapter 8.

The chapter begins with a brief discussion of the types of missing data and the goals of
imputation in SIPP. It then presents an overview of the editing and imputation procedures used to
deal with missing and inconsistent data. Next, the chapter provides a detailed description of each
of the major steps used by the Census Bureau when creating its internal files and the files that are
released for public use. Prior to 1996 the development of cross-sectional wave files involved
mainly cross-sectional editing and imputation. The longitudinal files involved longitudinal
editing. Beginning with the 1996 Panel, the processing procedures for the wave files were
replaced with methods that use prior wave information to inform the editing and imputation of a
current wave (after wave 1). The generic imputation technique, that is, the hot-deck method, is
still used in the 1996+ Panels, but the donors are now chosen on the basis of similarities in
reported prior wave information when that reported information exists.

The SIPP Web site (http://www.sipp.census.gov/sipp/) supplements the information in this
chapter with detailed information about all variables on the public use files.


Types of Missing Data
As in all surveys, there are two general types of missing data in SIPP: unit nonresponse and item
nonresponse. Unit nonresponse occurs in SIPP when one or more of the people residing at a
sample address are not interviewed and no proxy interview is obtained. This can happen for a
number of reasons, described in Chapter 2. Most types of unit nonresponse are dealt with
through weighting adjustments (see Chapters 2 and 8). However, the data editing and statistical
imputation procedures described in this chapter are used with one type of unit nonresponse: Type
Z noninterviews, which occur when an interview is obtained from at least one household
member but interviews are not obtained from one or more other sample persons in that


                                                4-1
SIPP USERS’ GUIDE

household.1 Prior to the 1996 Panel and in some instances in the 1996 Panel, the method used to
adjust for person-level noninterviews in the core wave files is known as Type Z imputation,
which is discussed below.

Item nonresponse occurs when a respondent completes most of the questionnaire but does not
answer one or more individual questions. Item nonresponse data in SIPP occur under the
following circumstances:

!   Responding sample persons refuse or are unable to provide requested information;
!   Interviewers fail to ask a question or incorrectly record a response;
!   A response is inconsistent with related responses or is incompatible with response categories;
    and
!   Interviewers make an error when recording or keying in the data.2
Item nonresponse data are generally imputed for core items, as well as for many topical module
items.


Goals of Imputation
Missing data cause a number of problems: analyses of data sets with missing data are more
problematic than analyses of complete data sets; there is a lack of consistency among analyses
because analysts compensate for missing data in different ways and their analyses may be based
on different subsets of data; and, in the presence of nonresponse that is unlikely to be completely
random, estimates of population parameters are biased.

Because missing data are always present to some degree, analyses of survey data must be based
on assumptions about patterns of missing data. When missing data are not imputed or otherwise
accounted for in the model being estimated, the implicit assumption is that data are missing at
random after controlling for other variables in the model. The imputation procedures used for
SIPP are based on the assumption that data are missing at random within subgroups of the
population (as defined by the cells of the imputation matrices described later in this chapter).

The statistical goal of imputation is to reduce the bias of survey estimates. This goal is achieved
to the extent that systematic patterns of item nonresponse are correctly identified and modeled.
In SIPP, the statistical goals of imputation are general, rather than specific. Instead of addressing
the estimation of specific parameters, SIPP procedures are designed to provide reasonable
estimates for a variety of analytical purposes.


1
  That can happen either because people refuse to be interviewed or because they are unavailable for the interview
and a proxy interview is not obtained.
2
  Prior to the 1996 Panel, errors could also occur when data-entry workers were keying in results from the paper
survey.


                                                      4-2
                                                                       DATA EDITING AND IMPUTATION

Data editing is generally preferred over statistical imputation, and it is used whenever a missing
item can be logically inferred from other data that have been provided. When information exists
on the same record from which missing information can logically be inferred, that information is
used to replace the missing information. The advantage of data editing is that it avoids the
increase in variance that occurs when missing items on one record are imputed with nonmissing
responses from other records.


Assessing the Influence of Imputed Data on
Analysis
Users of SIPP data interested in assessing the influence of imputed data on their analyses should
consider whether SIPP imputation procedures have properties that affect their specific analytical
requirements. A general discussion of the treatment of missing data in sample surveys is given in
Kalton and Kaspyrzyk (1986). Sedransk (1985), Little (1986), and Jinn and Sedransk (1987)
discuss properties of commonly used imputation processes. An example of the impact of
imputation procedures on the distributional characteristics of a low-income population is
discussed in Doyle and Dalrymple (1987).

An evaluation of the effects of imputed data should include a review of rates of unit nonresponse
and an assessment of the extent of item nonresponse. Unit nonresponse tends to increase over the
life of a panel, as does the likelihood that nonresponse is not a random effect. And as the
percentage of eligible sample members re-interviewed decreases, the pool from which donors3
are selected shrinks accordingly. This smaller pool of donors leads to an increased likelihood that
individual donors will be used more than once, which in turn increases the variance of an
estimate.

The effects of imputation will likely be small for items with low rates of missing data as long as
rates of item nonresponse are not high among important subclasses. Lepkowski et al. (1987),
using data from a large federal survey, provide a framework for evaluating the effect of imputed
values on analyses. This framework can be readily adapted to SIPP analyses.


An Overview of the Process
There are two phases to the processing of SIPP data. At the conclusion of each wave of
interviewing, the data collected during that wave are processed, creating the core wave and
topical module files. That is the first phase of processing. Then, at the conclusion of the final
wave of interviews, core data from all waves are linked and a new set of edit and imputation
procedures is applied to the resulting full panel file. That is the second phase of processing.


3
    Cases with complete data that are the source of the imputed values placed on the records with missing data.


                                                          4-3
SIPP USERS’ GUIDE

Figure 4-1 illustrates the steps that generate the Census Bureau’s internal core wave and full
panel files.

               Figure 4-1. Sequence of Cross-Sectional Imputation and Longitudinal
                                        Editing Procedures

Imputation of Sample Unit Characteristics (Tenure, etc.)                    Imputation of Item
                                                                            Missing Data for Sample
Imputation of Personal Demographic Characteristics (Age, Race,


                                                                                                       Sequence is Repeated for Each Wave
                                                                            Unit Characteristics and
Marital Status)
                                                                            Personal Demographic
                                                                            Characteristics
Type Z Imputationsa                                                         Imputation of Person-


                                                                                                                    in a Panel
                                                                            Level Noninterviews
Imputation of Labor Force Items and Recipiency of Income and Assets         Imputation of Item
                                                                            Nonresponse in Core
Imputation for Item Nonresponse in Records for “Other” Cash Income
                                                                            Questions
Imputation for Item Nonresponse in Self-Employment Identification
Sections
Imputation for Item Nonresponse in Asset Sections (Property Income)
Imputation for Item Nonresponse for Household Program Information
Editing for Demographic and Household Variables, Employment                  Editing of Longitudinal
Variables, General Amount Variables, and Other Variables                     Record
a
    Most Type Z records in the 1996 Panel were not handled in a separate process.


Phase 1 Summary

There are six steps in the first phase of SIPP data processing:

1. As each wave of interviewing is completed, core data collected during the wave are edited
   for internal consistency.
2. Following data editing, the statistical matching and hot-deck procedures described later in
   this chapter are used to impute missing data from the core wave file.
3. A public use version of the core wave file is then created from the resulting internal core
   wave file. The public use file is the same as the Census Bureau’s internal file except that it
   has certain information suppressed or topcoded to protect the confidentiality of survey
   respondents (see sections on Topcoding and Suppression of Geographic Information, at the
   end of this chapter).
4. On a separate production track from the core data, data from the topical module file
   administered with the wave are edited for internal consistency. The extent of data editing
   varies across the topical modules, and some topical modules receive almost no editing.


                                                         4-4
                                                          DATA EDITING AND IMPUTATION

5. Next, hot-deck procedures are used to impute missing data in the topical module. The extent
   of imputation varies across the topical modules; some topical modules have no missing data
   imputed.
6. A public use version of the topical module file is created from the resulting internal file. As
   with the public use core wave files, the public use topical module files have certain
   information suppressed to protect the confidentiality of survey respondents.
These steps are repeated at the conclusion of each wave of interviews. Prior to the 1996 Panel,
each wave was processed independently of other waves of data. Thus, when multiple core wave
files are linked, apparent changes in a respondent’s status could be due to different applications
of data edits and imputations to the files being combined (file linkage is the subject of Chapter
13). With the 1996 data, the hot-deck procedure was redesigned to rely on historical information
reported in prior waves. In addition, other forms of longitudinal imputation, such as carryover
methods, were adapted.


Phase 2 Summary

At the conclusion of the panel, the Census Bureau creates a full panel file containing core data
from all waves. There are four steps to this process.

1. Core data from all waves are linked. Those data have already been subjected to the Phase 1
   edit and imputation procedures.
2. A series of longitudinal edits are applied to the full panel file. Unlike the core wave edit
   procedures, these edits are designed to create longitudinally consistent records for each
   person. Both reported values and values that were imputed during the first phase of
   processing are subject to change. Thus, the data in a full panel file may differ from the data in
   the core wave files from which the full panel file was constructed.
3. A missing wave imputation procedure is then applied. Data are imputed when a sample
   member was absent for one or two consecutive waves but was present for the two adjacent
   waves. Data for the missing wave(s) are interpolated on the basis of information from the
   fourth month of the prior wave and the first month of the subsequent wave. The missing
   wave imputation procedure was introduced with the 1991 Panel. Earlier panels were not
   subjected to this procedure.
4. A public use version of the full panel file is created from the resulting internal file. The
   public use file has certain information suppressed to protect the confidentiality of survey
   respondents.
The balance of this chapter describes in greater detail the full sequence of data edit and
imputation procedures applied to SIPP data files. Most of the material contained in this chapter is
taken from Pennell (1993).


                                                4-5
SIPP USERS’ GUIDE


Phase 1: Data Editing and Imputation
Procedures for the Core Wave Files
The data processing sequence for each wave is detailed below.


Data Entry and Initial Editing

Beginning with the 1996 Panel (Chapter 2), all of the data entry and some of the initial data
editing are performed by computer-assisted interviewing while the interview is in progress.
Before the 1996 Panel, the first stages of data processing involved editing the paper
questionnaires for completeness, reasonableness, and consistency. Those data checks were
conducted first by field representatives before they submitted their questionnaires to the regional
offices and then by the regional and central offices of the Census Bureau. The next step was data
entry, in which clerks keyed in the information from control cards and questionnaires. Edits were
built into the data-entry program to ensure that the data were keyed in the proper sequence and
that certain key identifiers, such as control number, name, and relationship to householder, were
present. Following this step, the data files were transmitted electronically to Census Bureau
headquarters.


Imputation for Sample Unit Characteristics and
Personal Demographic Characteristics

Items in this category, including housing tenure (owned or rented), age, race, marital status, and
so forth, must be present for any further data processing to take place. If these values cannot be
logically derived, they are imputed. The imputation procedure is a modified version of the
sequential hot-deck procedure described below.


Type Z Imputation for Core Items in the Core Wave Files

Pre-1996 Panels. Type Z imputation was the method used in the pre-1996 panels to impute core
items for person-level noninterviews. There are two categories of person-level noninterviews
subject to imputation for the core questions. The first category includes individuals 15 years of
age and older who were members of interviewed households at the beginning of the 4-month
reference period but were not original sample members or members of any SIPP-interviewed
household on the date of the interview—that is, people not interviewed because they moved out
of the sample household between the beginning of the reference period and the interview date.
Had these people been original sample members, they would be interviewed at their new address.


                                               4-6
                                                         DATA EDITING AND IMPUTATION

Rather, these are all people who entered the SIPP sample after the first wave and were in the
sample because at some point they were living with an original sample member.

The second category of imputed noninterview includes people 15 years of age or older who were
members of SIPP-interviewed households on the date of the interview and during all or a portion
of the 4-month reference period but who were not interviewed because they refused to cooperate
or were unavailable for the interview and a proxy interview was not obtained.

The Type Z imputation procedure is based on a hierarchical sorting and merging operation that
matches noninterviews with respondents on socioeconomic characteristics available for both.
The variables used to match noninterviews with respondents are age, race, gender, marital status,
household relationship, education, veteran status, parent/guardian status, and income and asset
sources. Pennell (1993, Figure C-1) provides a table of variables used to match recipients with
donors. The Type Z imputation procedure is designed to always find a match. Type Z
noninterviews are imputed by assigning values from the matching donor to the noninterview
record. The donor values are assigned in full, except for identification variables or other
variables not relevant for the household in which the noninterview occurred. Pennell (1993)
gives a complete account of Type Z imputation, including detailed descriptions of matching
operations.

1996 Panel. In Waves 2–12 of the 1996 Panel, the general imputation procedure (the sequential
hot-deck procedure described in the following pages) is being used to impute core items for most
person-level noninterviews. That is, these types of noninterviews are no longer set aside—in the
1996 and later panels—for the specialized Type Z imputation procedure. However, the Type Z
imputation procedure is still used in Wave 1 of the 1996 Panel (because there is no prior wave
information to inform the imputation process) and for noninterviews for persons in Waves 2–12
for whom there is no prior wave information (because they are new to the sample).


Imputation of Item Nonresponse in Core Questions

SIPP core items are imputed in the following order:

1. Labor force participation, recipiency of income, and asset holdings;
2. Other cash income;
3. Wage, salary, and self-employment income amounts;
4. Asset income amounts; and
5. Program participation and benefits.


                                              4-7
SIPP USERS’ GUIDE


The Sequential Hot-Deck Imputation Procedure

The statistical imputation method used to impute missing items from the core questions and
topical modules is known as a sequential hot-deck procedure.4 In a general sense, the sequential
hot-deck procedure, like the Type Z imputation procedure, matches a record with missing data to
that of a donor with similar background characteristics and uses the donor’s values. This
procedure differs from data editing, which replaces missing data with inferred values based on
nonmissing data from the same case.

The sequential hot-deck procedure used in SIPP involves five key steps:

1. Specifying cold-deck or initial donor values;
2. Sorting the sample cases;
3. Identifying records with no item nonresponse and updating hot-deck values;
4. Classifying cases into subclasses of the population, referred to as imputation classes or
   adjustment cells, according to values on a set of classification or auxiliary variables that are
   nonmissing for all cases (this step is omitted in the initial processing of the key demographic
   items—race, gender, etc.); and
5. Selecting replacement values from donor cases to impute item-missing data on recipient
   records.
Two types of sequential hot-deck imputation are used to provide values for missing items. In
Wave 1 and for each sample member who is new to a subsequent wave, the hot deck is cross-
sectional; only values from current wave responses are used in the definition of the hot-deck
cells. Beginning with Wave 2, previous wave values are included in the definition of the hot-
deck cells. In both instances, however, only current wave values from selected donors are used to
replace missing items (with several exceptions, described below). Longitudinal (or “previous
wave”) hot-deck imputation was not performed prior to the 1996 Panel. Each wave received only
the cross-sectional hot-deck imputation.

For example, the item indicating whether a person worked part-time in the reference period for
the wave (a dichotomous item) uses the longitudinal hot deck for “old” sample members and the
cross-sectional hot deck for new sample members. The 1996 Panel cross-sectional hot-deck
imputation is based on a cell structure with 288 cells that are based on cross-classifications of sex
(two categories), race (two categories), age (six categories), marital status (three categories),
disability status (two categories), and presence of own children (two categories). On the basis of
his or her current wave values for those categories, each new sample member in any later wave is
assigned to a cell; then the donor’s value in that cell is used to impute a value to the new sample
member.


4
  The hot-deck procedure used in SIPP for the core questions and topical module items is sequential because the
selection of replacement values is implemented one record at a time from an ordered file.


                                                     4-8
                                                         DATA EDITING AND IMPUTATION

The longitudinal hot-deck imputation for the part-time work item for old sample members in
Waves 2+ is based on a cell structure with 576 cells that are based on the same categories
described above with one extra category: whether or not the person worked part-time in the
previous wave. A donor is selected from that cell, and that value is imputed. The actual item is
imputed from a donor’s value of the item in the current wave; the previous wave value is used
only in the assignment of the cell. That procedure guarantees that the sample member is matched
to the donor who had the same value for the item in the previous wave. Therefore, sample
members who worked part-time in the previous wave will be matched only to donors who also
worked part-time in the previous wave. However, the actual hot-deck imputation comes from the
donor’s value in the current wave, which may or may not include part-time work.

Imputed values for the sample member are allowed in assigning the cell for some items. If a
sample member had an imputation for part-time work in the previous wave, that imputation is
used to define the cell for the longitudinal hot-deck imputation, even though it is an imputation
itself. That is not done for other items, such as asset items. Only a nonimputed or logically
imputed value “counts” toward the longitudinal hot deck for those items.

The part-time item is dichotomous; the previous wave imputation matrix was essentially the
current wave imputation matrix with the previous wave’s value of the item added to the matrix.
In many cases, the differences between the two imputation matrices will be more pronounced,
especially for items with several categories of answers. An example of this is the item “reasons
why person worked less than 35 hours in the reference period.” There are 12 categories for that
item. The previous wave hot-deck imputation matrix uses the following characteristics to define
cells:

Previous wave value for item (12 categories);

!   Sex (two categories);
!   Race (two categories);
!   Age (six categories).
The current wave imputation matrix uses the following characteristics to define cells:

!   Sex (two categories);
!   Race (two categories);
!   Age (six categories);
!   Marital status (three categories);
!   Disability status (two categories);
!   Presence of own children (two categories).
A different type of example is the item gross pay in the first month of the reference period. For
new SIPP sample members, a cross-sectional hot-deck imputation is carried out by using the
following characteristics to generate cells:


                                                 4-9
SIPP USERS’ GUIDE

!   Industry and occupation category (16 categories);
!   Sex (two categories);
!   Hours worked (three categories);
!   Education level (three categories).
For old sample members, a longitudinal hot-deck imputation is carried out by using the previous
wave value for the item gross pay in the fourth month of the preceding wave’s reference period.5
This continuous value is divided into 138 categories, starting from $1 to $100, to over $50,000.
Sample members are matched to donors by using the previous wave values of those categories.

For labor force items, the Census Bureau uses the following special imputation procedures when
a person has no current wave information indicating whether or not he or she worked during the
reference period. If the Census Bureau can infer from what it knows about the previous reference
period whether the person had a job or business at the start of the current period, the Census
Bureau carries out the following procedure:

1. If the person was working at the end of the prior wave, then labor force participation is
   imputed from a single donor for the complete current wave.
2. The Census Bureau then projects job characteristics for the person from the person’s prior
   wave through the current wave.
3. Finally, the Census Bureau edits the job characteristics for consistency with the imputed
   labor force participation variables.
This procedure is known as an EPPFLAG imputation, after the name of the variable that
indicates its use.

If a person was a nonworker in the prior wave or the Census Bureau cannot infer work status on
the basis of prior wave data, then the person’s work status is imputed. If the person is imputed as
a worker in the reference period, the Census Bureau imputes the complete set of job/business
characteristics variables and labor force participation variables to the person from one donor, in
order to maintain consistency among the fields. That procedure is called a “little Type Z”
imputation.

For some items in some cases, a direct logical or carryover imputation is made. The carryover
imputation takes the previous wave’s value for the item for the sample member and imputes it to
the current wave. That imputation is done particularly for items that rarely (or never) change for
a sample member across waves (such as sex and race) or for items that change in predictable
ways (such as age).


5
  The second month of the reference period actually uses as the “previous wave value” the first month value, with
the third month using the second month, and so forth, so that these imputations are really previous month rather than
previous wave.


                                                       4-10
                                                            DATA EDITING AND IMPUTATION

SIPP hot-deck procedures are designed to preserve the univariate distribution of each variable
subjected to imputation. These procedures do not, in general, preserve the covariances among
variables. Although some of those interrelationships might be preserved to a certain extent, that
is not the primary intent of the hot-deck imputation procedures used by the Census Bureau. One
consequence is that imputation can introduce inconsistencies into the data. For example, if a
respondent has reported program participation, but his or her income is too high for that
program, it is possible that the income data have been imputed. Whenever users detect
inconsistencies, it is wise to check the allocation (imputation) flag to see if the inconsistent data
might have been imputed. The discussion of allocation (imputation) flags later in this chapter
provides more information.


Starting or Cold-Deck Values

In other surveys, cold-deck values in a sequential hot-deck procedure historically served as the
initial set of replacement values for missing items in the first record processed; missing items in
subsequent records typically received replacement (hot-deck) values from the current data set. In
SIPP, however, cold-deck values are seldom used as replacement values for either the first or
subsequent records processed. During later stages of processing, as the cold-deck values are
replaced with information from the current wave, the array of cells is referred to as the hot-deck
matrix. The cells in the matrix are defined by the cross-classification of auxiliary variables
(Pennell, 1993, Figure 3.3). Each cell in the matrix corresponds to respondent cases with the
same set of values on the classification variables. Many different matrices are defined in SIPP,
and each matrix corresponds to one or more variables subject to imputation.


Sorting the Sample Cases

The records in the sample file are sorted by three geographic variables prior to imputing item-
missing data. The three geographic sort variables are primary sampling unit, segment number,
and serial number. The cases are sorted prior to processing and are not re-sorted at any other time
during the imputation process. The sorting operation creates a file in which neighboring records
represent geographically proximate households.


Preprocessing the Sample File: Initial Updating of Cold-Deck Values

Once the cases have been sorted, they are processed through a series of programs. During the
first pass against the programs, the cold-deck values are updated with information from the
current wave; missing data are not imputed. The initial processing is done separately for each of
the five groups of related core variables listed above. During the first pass, the first record in the
sorted file with consistent and nonmissing data for a particular group of variables is identified
and the values from that case replace the cold-deck values for that section in the matrix. The
values for each subsequent record with consistent and nonmissing information update the
previous set of consistent and nonmissing values written to the matrix. The checking and
updating operation continues until all records in the data file have been processed. The last
values written to the matrix serve as the starting values in the subsequent sequential hot-deck


                                                4-11
SIPP USERS’ GUIDE

procedure. In this way, cold-deck values are rarely used as replacement values in SIPP because
the initial processing usually replaces all starting values with values from the current wave of
data.


Allocating Cases into Imputation Classes

In the next step of the imputation procedure, each respondent record or noninterview record in
the sorted file is allocated to one of the imputation classes or adjustment cells according to its
values on the set of classification, or auxiliary, variables.6

1. The auxiliary variables are chosen for each item or set of related items on the basis of their
   level of correlation with the item receiving the imputation (i.e., classification variables are
   chosen on the basis of their ability to explain the variability of the item or set of related
   items); Census Bureau researchers assign different sets of classification variables to different
   sets of items.
2. The auxiliary variables are either dichotomous or polychotomous categorical variables (e.g.,
   sex, race); if they are continuous, they are categorized into a parsimonious number of levels
   (e.g., income, asset levels).
3. The level of the auxiliary variables then define a matrix, with the number of cells in this
   matrix being the product of the number of levels for each auxiliary variable. For example, an
   imputation defined by five variables, each with three levels, has a total of 243 cells. Any
   given item or set of related items may have imputation matrices with the numbers of cells
   ranging from under 100 to well over 1,000, depending on the matrix.
Auxiliary variables such as sex, race, and categorizations of age (with different categorizations
for different items) are used frequently in the matrices, as are more specialized auxiliary
variables that are relevant for particular items (such as industry and occupation category for the
monthly gross pay item). Pennell (1993) gives examples of the different sets of classification
variables for previous panel years.

The allocation of sample cases into imputation classes (also known as subclasses or strata)
according to a set of classification variables serves several purposes. Ideally, the set of
classification variables should account for a large proportion of the variance in the variable being
imputed and should be associated with variations in response rates. To the extent that this is
accomplished, the classification procedure creates homogeneous adjustment cells containing
similar cases. In this way, donors and recipients are similar under the assumption that the
nonresponse mechanism within the imputation class is not related to the item being imputed; that
is, an underlying assumption is made that item nonresponse data are distributed randomly within
the subclass defined by the cross-classification of the auxiliary variables. The selection of
classification variables may also place bounds on the range of values that can be imputed and
implicitly satisfy edit constraints. The implicit stratification created by the sort order of the file


6
 This step is omitted for the imputation of the primary demographic values that are imputed before the person-level
noninterviews.


                                                      4-12
                                                           DATA EDITING AND IMPUTATION

further improves the opportunity for better imputation to the extent that nearby cases are more
similar to each other than cases that are farther apart in the file.


Imputing for Missing Data and Updating of Hot-Deck Values

The selection of replacement values for missing items is restricted to donor and recipient records
within each particular cell; that is, records allocated to one cell never donate information to
records in another cell with missing items. As the file is processed through the set of programs
the second time, the imputations are performed and the set of hot-deck values is updated once
again.

The records are processed sequentially, according to the sort order of the file. A missing item is
given the value of the last corresponding item that is nonmissing from a record in that imputation
class. If the value of an item in the current record is nonmissing, it replaces the previous hot-deck
value for that imputation class. In this way, the hot-deck value for each imputation class is
constantly being updated with the value of the last nonmissing case.

The updating is done item by item. Missing items in one record receive the current set of
replacement values. Then the nonmissing values in that record are used to update the hot deck in
preparation for the next record. At any point during the process, the donated values in the hot
deck likely come from many different respondents, even within imputation classes. That is why
this imputation procedure does not preserve covariances among the variables being imputed.


Allocation (Imputation) Flags

An allocation (imputation) flag is associated with each core item subject to imputation. When an
item has been imputed, an allocation (imputation) flag for that item is set. Beginning with the
1996 Panel, allocation flags denoting either data edits or statistical imputations for all variables
are included on the core wave files. For core wave files from earlier panels, imputation flags are
included for most items subject to imputation.

An allocation (imputation) flag with the value 0 indicates no imputation, a value of 1 or 2
indicates a hot-deck imputation that uses only current quarter values, a value of 3 indicates a
logical imputation, and a value of 4 indicates a dependent imputation. This last category includes
imputations in which data have been carried over from the sample unit’s previous wave data and
imputations in which previous wave data are used as control variables. For detailed
documentation about the coding of allocation (imputation) flags for specific variables, analysts
can refer to the data dictionary for the data file with which they are working.

For items that receive Type Z imputations (in both the pre-1996 panels and the 1996 Panel) and
items receiving EPPFLAG and little Type Z imputations in the 1996 Panel, the allocation
(imputation) flag for a particular imputed item will not indicate by itself the imputation status of
the item. For Type Z imputations, the EPPINTVW field in the 1996 Panel and the person-level
INTVW field in the pre-1996 panels will indicate whether the Type Z procedure was used to
impute all items for the sample person (in these cases, EPPINTVW = 3 or 4 or INTVW = 3 or


                                               4-13
SIPP USERS’ GUIDE

4).7,8 The individual imputation flag for each item indicates whether or not that item was imputed
during the processing of the donor’s fields.

For EPPFLAG imputations, the EPPFLAG field will equal 1. When this is true, all labor force
participation and job/business characteristics fields are imputed via the EPPFLAG procedure,
whether or not the individual items indicate an imputation. As with the Type Z procedure, an
allocation (imputation) flag with a value greater than zero for any of the labor force participation
items means that the values of these items are not the original values from the donor but are
processed values that are consistent with the sample person’s demographics and household
composition; for the job/business characteristics fields, an allocation flag with a value of “4”
indicates that the sample person’s values in these fields have been projected forward from the
person’s values for these fields in the previous wave.

To find little Type Z imputations, check the allocation (imputation) flag of the variable
EPDJBTHN. If (a) EPDJBTHN = 1 (indicating that the person was a worker), (b) this item’s
allocation (imputation) flag is 1 or 4, and (c) EPPFLAG is not 1, then a little Type Z imputation
has taken place for all of the labor force participation and job/business characteristics fields. As
with the Type Z procedures, the allocation (imputation) flag for an individual item only indicates
whether the item was imputed when the donor’s fields were processed.

The full panel files carry only a subset of the allocation (imputation) flags carried on the core
wave files. The value of an allocation (imputation) flag is set during wave processing, and,
usually, it is not modified to reflect any changes in value resulting from the longitudinal editing
discussed below. The Census Bureau does reset the values of some allocation flags to indicate
that a longitudinal imputation has occurred.


Topical Module Imputation Procedures

When item-missing data in topical modules are imputed, the same sequential hot-deck procedure
used to impute item-missing data in the SIPP core is used. Topical module data for Type Z
noninterviews are also imputed item by item with the sequential hot deck. Those cases are not
subjected to the Type Z imputation procedure that was used for core items in the pre-1996
panels.


7
  The codes for EPPINTVW and INTVW differ. In the 1996 Panel, EPPINTVW is coded as follows: 1 = Interview
(self), 2 = Interview (proxy), 3 = Noninterview—Type Z, 4 = Noninterview—pseudo Type Z (left sample during the
reference period), and 5 = Children under 15 during the reference period. In the pre-1996 panels, INTVW for person
is coded as follows: 0 = Not applicable (children under 15), 1 = Interview (self), 2 = Interview (proxy), 3 =
Noninterview—Type Z refusal, and 4 = Noninterview—Type Z other.
8
  Note that for the 1990–1993 Panels, INTVW can equal 5 on the core wave files (this value is not documented in
the codebook). A value of 5 denotes persons in the sample early in the wave who were not in the sample at the time
of interview. Such persons are processed as if they are a Type Z nonrespondent. Prior to the 1990 Panel, such
persons are identified as those with PP-MIS5 ( 1 but PP-MISj ≠ 1 for j = 1, 2, 3, or 4.


                                                      4-14
                                                                    DATA EDITING AND IMPUTATION


Phase 2: Data Editing Procedures for the Full
Panel Files
At the conclusion of each SIPP panel, core data from all waves are assembled into the full panel
file. That assembly is done after all waves have been processed separately, producing the core
wave files. Once all waves are linked, longitudinal edits are applied to the SIPP full panel files to
ensure that the data for each respondent are consistent over time. Although the core wave files
are edited for consistency, some types of inconsistencies become apparent only when looking at
the data over multiple waves. Starting with the 1996 Panel, some longitudinal editing has been
built into the CAI instrument. The ability to carry data across waves in the CAI environment is
expected to result in better cross-wave consistency in the core wave files and in less need for
subsequent longitudinal editing.9


Pre-1996 Full Panel Files

Because the specifications for editing the 1996 full panel files differ from those for the pre-1996
files, the following discussion refers only to pre-1996 procedures. Longitudinal edits in the pre-
1996 panels were applied for selected variables. The edits were designed (1) to correct cross-
wave inconsistencies, which become apparent only when multiple waves are examined together,
and (2) to honor the preference to replace imputed values from one wave with reported values
from another wave.

Unlike the hot-deck imputation procedures used with the core wave files, the longitudinal edits
in the pre-1996 files did not replace missing data for one person with reported data from another
person. When a data value was modified during longitudinal editing, the replacement value was
obtained from the same record either directly (by copying a reported value from a different
month) or indirectly (using some form of interpolation or extrapolation from reported values in
other months). Those procedures could cause modifications both in reported and imputed values.
When a data value was modified during longitudinal editing, the associated imputation flag was
not changed. In addition, the core wave files were not revised to reflect changes made during
longitudinal editing. Thus, the data for any given respondent may differ between the core wave
files and the full panel file, and estimates based on the full panel file may differ from those based
on the core wave files.


9
  Prior to CAI, a control file was developed at Wave 1 that contained a unique identifier for each sample person, as
well as that person's age, sex, and race. In subsequent waves, the control file provided a means of detecting
inconsistencies in age, sex, and race across waves. As each wave of data was received, the reported age, sex, and
race of the sample person were checked against the control file and corrections were made. Also prior to CAI,
income recipiency was brought forward to the subsequent wave.


                                                      4-15
SIPP USERS’ GUIDE

The longitudinal edits in the pre-1996 files were performed independently on four groups of
variables:

1. Demographic and household composition variables;
2. Earned income variables;
3. Other income variables, Food Stamp variables, WIC variables, and program coverage
   variables; and
4. Medical insurance variables.
In most cases, the values reported during Wave 1 were used as the standard against which
inconsistencies were judged. Pennell (1993) provides detailed information about longitudinal
consistency edits for specific variables.


1996 Full Panel File

The specifications for editing the 1996 full panel file are not yet complete. The basic difference
between the pre-1996 and the 1996 full panel files is that the editing procedures for the 1996
panel incorporate longitudinal imputation based on prior wave information.


Missing Wave Imputation

There are many instances in which data are missing for a person in one or two consecutive waves
but are present for that same person in the two adjacent waves. For example, a person may be
missing in Wave 5 but have complete data for Waves 4 and 6. Beginning with the 1991 Panel,
the Census Bureau began imputing those missing waves in the full panel files. Missing wave
imputation is performed only when one or two consecutive missing waves are bounded on both
sides by waves in which the sample member was present. If a respondent has missing data for
more than two consecutive waves, the imputation is not performed.

For missing waves that are bounded on each side by interviewed waves, data are interpolated
using a random carryover procedure. A value r is randomly assigned to each nonrespondent’s
household for each missing wave, where r = 0, 1, 2, 3, or 4. The first r reference months within
the missing wave receive their imputed values from the fourth month of the preceding wave, and
the remaining 4 – r reference months receive their imputed amounts from the first month of the
subsequent wave.

Although this procedure results in data conducive to many analytic purposes, the random
carryover forces stability in responses for wave nonrespondents. That stability could result in
underestimation of between-wave changes. The procedure also results in imputed waves that do
not exhibit the seam effect common to waves of reported data (Chapter 6). Williams and Bailey
(1996) provide a complete account of the handling of missing wave data in SIPP.


                                              4-16
                                                         DATA EDITING AND IMPUTATION


Confidentiality Procedures for the
Public Use Files
All of the editing and imputation procedures described in the preceding sections are part of the
process of preparing the data for internal Census Bureau use. Before the files are released for
public use, they undergo additional editing to protect the confidentiality of respondents. Two
procedures are used: topcoding of selected variables (income, assets, and age) and suppression of
geographic information. As a result of these procedures, estimates based on data from the public
use files will differ slightly from the Census Bureau’s published estimates.


Topcoding

One piece of information that might reveal a respondent’s identity is a very high income. For that
reason, the Census Bureau topcodes income before making that information publicly available,
recoding any income amounts over a certain maximum value to that maximum. In other words,
income on the public use data files has a ceiling value. Although income is the primary variable
that is topcoded, other variables that may disclose a respondent’s identity, such as age, are also
topcoded. A few variables, such as starting dates for employment, may be bottomcoded if they
pose a disclosure risk. Chapter 10 and Appendix B provide a thorough discussion of topcoding
methods and procedures in SIPP.


Suppression of Geographic Information

Geographic information that can be used to directly identify survey respondents, such as an
address, is removed from the public use files. In addition, states and metropolitan areas with
populations less than 250,000 are not identified. Specific nonmetropolitan areas (such as counties
outside of metropolitan areas) are never identified. In certain states, when the nonmetropolitan
population is small enough to present a disclosure risk, a fraction of that state’s metropolitan
sample is recoded to nonmetropolitan status. For that reason, the SIPP data cannot be used to
estimate characteristics of the population residing outside metropolitan areas. Chapter 10
provides details.

For the 1996 Panel, state-level geography is shown for 45 states and the District of Columbia.
The remaining five states are combined as follows:

1. Maine, Vermont; and
2. North Dakota, South Dakota, Wyoming.


                                              4-17
SIPP USERS’ GUIDE

For the 1984 through 1993 Panels, state-level geography is shown for 41 individual states and
the District of Columbia; the nine other states are combined into three groups:

1. Maine, Vermont;
2. Iowa, North Dakota, South Dakota; and
3. Alaska, Idaho, Montana, Wyoming.


                                            4-18
5. Finding SIPP Information
Both the data collected in SIPP and supporting documentation are available in various forms.
They include published estimates based on those data, microdata in several formats,
documentation for each of the microdata files, and more general documentation about
methodological issues in SIPP. The latter includes the SIPP Quality Profile, a series of working
papers distributed by the Census Bureau, articles published in academic journals, and conference
proceedings. This chapter discusses SIPP published estimates, briefly describes the data files and
supporting documentation, and provides information on how to obtain them.


Published Estimates from SIPP
Published estimates from SIPP data are useful to data analysts in a number of ways. First, Census
Bureau publications may already contain the estimates needed for the research project at hand,
thus saving users the need to generate those estimates themselves. Second, published estimates
can often provide a useful cross-check for closely related estimates prepared by analysts.

Published estimates are based on the Census Bureau’s internal data files, and it is often
impossible to replicate published estimates exactly. That is because the internal files have not
been subjected to topcoding and other data-suppression techniques that are necessary to protect
confidentiality on the public use microdata files. Chapter 4 provides information on data editing
and imputation.

The Census Bureau’s P-70 series of publications is the primary source for published estimates
from SIPP. Table 5-1 displays the titles and publication numbers of reports in the series that are
currently available from the Census Bureau. Copies of those reports can be obtained from the
U.S. Government Printing Office, Washington, DC 20402. For telephone orders, users can call
(202) 783-3238, or they can fax orders to (202) 783-3236. An updated list of P-70 series reports
can be obtained from the SIPP Web site (http://www.bls.census.gov/sipp/); each of the reports
contains a phone number the reader can call for further information or clarification. Users can
reach the population division staff for demographics questions at (301) 457-2422, or they can
call the SIPP information phone number: (301) 457-3242.


SIPP Public Use Microdata Files
Following data collection as described in Chapter 2 and postcollection processing as described in
Chapter 4, the Census Bureau prepares data files in formats compatible with the most common
methods of analysis. Those microdata are available in several file formats and can be obtained on


                                               5-1
SIPP USERS’ GUIDE

                             Table 5-1. Publications in the P-70 Series

Publication
Number         Title
P-70-1         Economic Characteristics of Households in the U.S. Third Quarter 1983
P-70-2         Economic Characteristics of Households in the U.S. Fourth Quarter, 1983
P-70-3         Economic Characteristics of Households in the U.S. First Quarter,1984
P-70-4         Economic Characteristics of Households in the U.S. Second Quarter, 1984
P-70-5         Economic Characteristics of Households in the U.S. Third Quarter, 1984
P-70-6         Economic Characteristics of Households in the U.S. Fourth Quarter, 1984
P-70-7         Household Wealth and Asset Ownership, 1984
P-70-8         Disability, Functional Limitations, and Health Insurance Coverage: 1984-1985
P-70-9         Who’s Minding the Kids? Child Care Arrangements: Winter 1984-1985
P-70-10        Male-Female Differences in Work Experience, Occupation, and Earnings: 1984
P-70-11        What’s It Worth? Educational Background and Economic Status: Spring 1984
P-70-12        Pensions: Workers Coverage and Retirement Income, 1984
P-70-13        Who’s Helping Out? Support Network Among American Families
P-70-14        Characteristics of Persons Receiving Benefits from Major Assistance Programs
P-70-15-RD-1   Transitions in Income and Poverty Status: 1984-1985
P-70-16-RD-2   Spells of Job Search and Layoff...and Their Outcomes
P-70-17        Health Insurance Coverage, 1986-1988
P-70-18        Transitions in Income and Poverty Status: 1985-1986
P-70-19        The Need for Personal Assistance with Everyday Activities: Recipients and Caregivers
P-70-20        Who’s Minding the Kids? Child Care Arrangements: Winter 1986-1987
P-70-21        What’s It Worth? Educational Background and Economic Status: Spring 1987
P-70-22        Household Wealth and Asset Ownership: 1988
P-70-23        Family Disruption and Economic Hardship: The Short-Run Picture for Children
P-70-24        Transitions in Income and Poverty Status: 1987-1988
P-70-25        Pensions: Worker Coverage and Retirement Benefits, 1987
P-70-26        Extended Measures of Well-Being: 1984
P-70-27        Job Creation During Late 1980’s: Dynamic Aspects of Employment Growth
P-70-28        Who’s Helping Out? Support Network Among American Families
P-70-29        Health Insurance Coverage: 1987 to 1990
P-70-30        Who’s Minding the Kids? Child Care Arrangements: Fall 1988
P-70-31        Characteristics of Recipients and the Dynamics of Program Participation: 1987-1988
P-70-32        What’s It Worth? Educational Background and Economic Status: Spring 1990
P-70-33        Americans with Disabilities: 1991-1992
P-70-34        Household Wealth and Asset Ownership: 1991
P-70-35        Monitoring the Economic Health of American Households: Average Monthly Estimates of
                   Income, Labor Force Activity, Program Participation and Health Insurance, First Quarter 1984
                   to Third Quarter 1991
P-70-36        Who’s Minding the Kids? Child Care Arrangements: Fall 1991
P-70-37        Dynamics of Economic Well-Being: Health Insurance, 1990-1992
P-70-38        The Diverse Living Arrangements of Children: Summer 1991
P-70-39        Dollars for Scholars: Postsecondary Costs and Financing, 1990-1991
P-70-40        Dynamics of Economic Well-Being: Labor Force and Income: 1990-1992
P-70-41        Dynamics of Economic Well-Being: Program Participation: 1990-1992
                                                                                               (table continues)


                                                    5-2
                                                                 FINDING SIPP INFORMATION

                     Table 5-1. Publications in the P-70 Series (continued)

Publication
Number         Title
P-70-42        Dynamics of Economic Well-Being: Poverty: 1990
P-70-43        Dynamics of Economic Well-Being: Health Insurance: 1991-1993
P-70-44        The Effect of Health Insurance Coverage on Doctor and Hospital Visits: 1990-1992
P-70-45        Dynamics of Economic Well-Being: Poverty: 1991-1993
P-70-46        Dynamics of Economic Well-Being: Program Participation: 1991-1993
P-70-47        Asset Ownership of Households: 1993
P-70-48        Dynamics of Economic Well-Being: Labor Force: 1991-1993
P-70-49        Dynamics of Economic Well-Being: Income: 1991-1992
P-70-50        Beyond Poverty, Extended Measures of Well-Being: 1992
P-70-51        What’s It Worth? Field of Training and Economic Status: 1993
P-70-52        What Does it Cost to Mind Our Preschoolers?
P-70-53        Who’s Minding Our Preschoolers?
P-70-54        Who Loses Coverage and for How Long?
P-70-55        Dynamics of Economic Well-Being: Poverty: 1992-1993, Who Stays Poor? Who Doesn’t?
P-70-56        Dynamics of Economic Well-Being: Income, 1992-1993, Moving Up and Down the Income
                 Ladder
P-70-57        Dynamics of Economic Well-Being: Labor Force, 1992-1993—A Perspective on Low-Wage
                 Workers
P-70-58        Dynamics of Economic Well-Being: Program Participation, 1992-1993—Who Gets Assistance?
P-70-59        My Daddy Takes Care of Me! Fathers as Care Providers
P-70-60        Financing the Future: Postsecondary Students, Costs, and Financial Aid
P-70-61        Americans with Disabilities: 1994-95
P-70-62        Who’s Minding Our Preschoolers – Fall 1994 Update
P-70-63        Dynamics of Economic Well Being: Poverty, 1993-94
P-70-64        Who Loses Coverage, and For How Long?
P-70-65        Moving Up and Down the Income Ladder
P-70-66        Seasonality of Moves and Duration of Residence
P-70-67        Extended Measures of Well-Being: Meeting Basic Needs
P-70-69        Dynamics of Economic Well-Being: Program Participation, Who Gets Assistance?
P-70-70        Who’s Minding the Kids? Child Care Arrangements
P-70-71        Household Net Worth and Asset Ownership, 1995
P-70-73        Americans With Disabilities: 1997


a variety of media. The following sections describe the file formats currently in use, each of
which is used for somewhat different SIPP data. Information is also provided about how to
obtain those data and supporting documentation.


Formats and Contents of SIPP Microdata Files

SIPP public use microdata are available in four types of files: core wave files, topical module
files, and full and partial panel files. The files vary in content and structure. Analysts should be
aware that their need for files depends on their particular application.


                                                 5-3
SIPP USERS’ GUIDE

Data files are available through the Customer Services Branch, Administrative and Customer
Services Division, at (301) 457-4100. Users can also extract data files by using on-line data
access tools, as described later in this chapter in “Sources for Obtaining SIPP Microdata.”


Core Wave Files

Core wave files contain the core labor force, income, household and family composition, and
program participation data from one wave of interviews. The core wave files are currently
available in person-month format, containing, for every person who was a member of a SIPP
household for at least 1 month during the 4-month reference period for that wave, one record for
each month that person was in-sample.1 In other words, a person who was in-sample for all 4
reference months has four records—one for each reference month. A person who was in-sample
for only 1 month would have just one record. The core wave files were designed to be used for
cross-sectional analyses. Analysts who do not wish to wait for the release of certain files can link
one or more core wave files to make their own longitudinal files. Chapter 13 discusses linking
files. Table 5-2 illustrates the structure of the person-month format for core wave files.

The core wave files are the only source of monthly cross-sectional weights. When using data
drawn from the full panel files for cross-sectional analyses, users must merge weights from the
core wave files. Chapter 8 explains how to select and merge weights.


Topical Module Files

Each topical module file contains selected core information along with the data from the topical
module administered in a given wave. As described in Chapter 2, different topical modules are
administered in each wave of a SIPP panel. Table 5-3 shows which topical modules were
administered for each wave of each SIPP panel. Table 5-4 lists topical areas along with the
panels and waves in which they were administered. Topical module files are issued in person-
record format; there is one record for each person who was a member of a SIPP household at the
time of the interview for that wave. Table 5-5 illustrates the structure of a topical module file.
For the topical modules, there are people for whom there is no topical information. Chapter 2
describes how the interviews are conducted and how topical module information is collected;
Chapter 4 explains how missing data are handled in the files. In the 1996 Panel, the month that
determines the universe for the topical module files changed to month 4.


1
 Prior to the 1990 Panel, the Census Bureau issued core wave files in a format with a single record for each person.
Those files are described in earlier editions of the SIPP Users' Guide.


                                                       5-4
                                                                       FINDING SIPP INFORMATION

              Table 5-2. Structure of the Person-Month Format Core Wave Files


                               Household        Family         Subfamily         Sample         Other Person
SUIDa    Person     Month      Vars             Vars           Vars              Status         Vars
1        1          1                                                            Yes
                    2                                                            Yes
                    3                                                            Yes
                    4                                                            Yes
          2         1                                                            Yes
                    2                                                            Yes
                    3          Missing          Missing        Missing           No             Missing
                    4          Missing          Missing        Missing           No             Missing
          3         1                                                            Yes
                    2                                                            Yes
                    3          Missing          Missing        Missing           No             Missing
                    4                                                            Yes
2         1         1                                                            Yes
                    2                                                            Yes
                    3                                                            Yes
                    4                                                            Yes
          2         1          Missing          Missing        Missing           No             Missing
                    2                                                            Yes
                    3                                                            Yes
                    4                                                            Yes
3         1         1                                                            Yes
                    2                                                            Yes
                    3                                                            Yes
                    4                                                            Yes
          2         1                                                            Yes
                    2                                                            Yes
                    3          Missing          Missing        Missing           No             Missing
                    4          Missing          Missing        Missing           No             Missing
4         1         1                                                            Yes
                    2                                                            Yes
                    3                                                            Yes
                    4                                                            Yes
a
  Sample unit ID number. Chapter 4 provides more information about identification numbers in SIPP.


                                                     5-5
SIPP USERS’ GUIDE

                         Table 5-3. Topical Modules, by Panel and Wave

Wave   Subject Areas
                                                     1996 Panel
 1     Recipiency History, Employment History
 2     Work Disability History, Education and Training History, Marital History, Migration History, Fertility
       History, Household Relationships
 3     Assets, Liabilities, and Eligibility; Medical Expenses/Utilization of Health CareAdults; Medical
       Expenses/Utilization of Health CareChildren; Work-Related Expenses; Child Support Paid
 4     Annual Income and Retirement Accounts, Taxes, Work Schedule, Child Care, Disability Questions
 5     School Enrollment and Financing, Child Support Agreements, Support for Nonhousehold Members,
       Functional Limitations and DisabilityAdults, Functional Limitations and DisabilityChildren,
       Employer-Provided Health Benefits
 6     Children’s Well-Being, Assets, Liabilities, and Eligibility, Medical Expenses/Utilization of Health Care
       Adults, Medical Expenses/Utilization of Health CareChildren, Work-Related Expenses, Child Support
       Paid
 7     Annual Income and Retirement Account, Taxes, Retirement and Pension Plan Coverage; Home Health
       Care
 8     Adult Well-Being, Welfare Reform
 9     Assets, Liabilities, and Eligibility; Medical Expenses/Utilization of Health CareAdults; Medical
       Expenses/Utilization of Health CareChildren; Work-Related Expenses; Child Support Paid
10     Annual Income and Retirement Accounts, Taxes, Work Schedule, Child Care
11     Child Support Agreements, Support for Nonhousehold Members, Functional Limitations and
       DisabilityAdults, Functional Limitations and DisabilityChildren
12     Assets, Liabilities, and Eligibility; Medical Expenses/Utilization of Health CareAdults; Medical
       Expenses/Utilization of Health CareChildren; Work-Related Expenses; Child Support Paid;
       Children’s Well-Being
                                                     1993 Panel
 1     Recipiency History, Employment History
 2     Work Disability History, Education and Training History, Marital History, Migration History, Fertility
       History, Household Relationships
 3     Work Schedule, Child Care, Child Support Agreements, Support for Nonhousehold Members, Functional
       Limitations and Disability, Utilization of Health Care Services
 4     Selected Financial Assets; Medical Expenses and Work Disability; Real Estate, Shelter Costs, Dependent
       Care, and Vehicles
 5     Annual Income and Retirement Accounts, Taxes, School Enrollment and Financing
 6     Work Schedule, Child Care, Child Support Agreements, Support for Nonhousehold Members, Functional
       Limitations and DisabilityAdults, Utilization of Health Care ServicesAdults, Functional Limitations
       and DisabilityChildren, Utilization of Health Care Services–Children, Children’s Well-Being
 7     Assets and Liabilities; Real Estate, Shelter Costs, Dependent Care, and Vehicles; Medical Expenses and
       Work Disability
 8     Annual Income and Retirement Accounts, Taxes, School Enrollment and Financing
 9     Retirement Expectations and Pension Plan Coverage, Child Support Agreements, Child Care, Support for
       Nonhousehold Members, Work Schedule, Children’s Well-Being, Basic Needs
                                                                                                   (table continues)


                                                      5-6
                                                                       FINDING SIPP INFORMATION

                 Table 5-3. Topical Modules, by Panel and Wave (continued)

Wave   Subject Areas
                                                   1992 Panel
 1     Recipiency History, Employment History
 2     Work Disability History, Education and Training History, Marital History, Migration History, Fertility
       History, Household Relationships
 3     Extended Measures of Well-Being (Consumer Durables, Living Conditions, Basic Needs)
 4     Assets and Liabilities, Retirement Expectations and Pension Plan Coverage, Real Estate Property and
       Vehicles
 5     Annual Income and Retirement Accounts, Taxes, School Enrollment and Financing
 6     Work Schedule, Child Care, Child Support Agreements, Support for Nonhousehold Members, Functional
       Limitations and Disability, Utilization of Health Care Services
 7     Selected Financial Assets; Medical Expenses and Work Disability; Real Estate, Shelter Costs, Dependent
       Care, and Vehicles
 8     Annual Income and Retirement Accounts, Taxes, School Enrollment and Financing
 9     Work Schedule, Child Care, Child Support Agreements, Support for Nonhousehold Members, Functional
       Limitations and DisabilityAdults, Utilization of Health Care ServicesAdults, Functional Limitations
       and DisabilityChildren, Utilization of Health Care ServicesChildren, Children’s Well-Being
10     No Topical Modules
                                                   1991 Panel
 1     No Topical Modules
 2     Recipiency History, Employment History, Work Disability History, Education and Training History,
       Marital History, Migration History, Fertility History, Household Relationships
 3     Work Schedule, Child Care, Child Support Agreements, Support for Nonhousehold Members, Functional
       Limitations and Disability, Utilization of Health Care Services
 4     Selected Financial Assets; Medical Expenses and Work Disability; Real Estate, Shelter Costs, Dependent
       Care, and Vehicles
 5     Annual Income and Retirement Accounts, Taxes, School Enrollment and Financing
 6     Extended Measures of Well-Being (Consumer Durables, Living Conditions, Basic Needs)
 7     Assets and Liabilities, Retirement Expectations and Pension Plan Coverage, Real Estate Property and
       Vehicles
 8     Annual Income and Retirement Accounts, Taxes, School Enrollment and Financing
                                                   1990 Panel
 1     No Topical Modules
 2     Recipiency History, Employment History, Work Disability History, Education and Training History,
       Marital History, Migration History, Fertility History, Household Relationships
 3     Work Schedule, Child Care, Child Support Agreements, Support for Nonhousehold Members, Functional
       Limitations and Disability, Utilization of Health Care Services
 4     Assets and Liabilities, Retirement Expectations and Pension Plan Coverage, Real Estate Property and
       Vehicles
 5     Annual Income and Retirement Accounts, Taxes, School Enrollment and Financing
 6     Time Spent Outside Work Force, Child Support Agreements, Support for Nonhousehold Members,
       Functional Limitations and Disability, Utilization of Health Care Services
 7     Selected Financial Assets; Medical Expenses and Work Disability; Real Estate, Shelter Costs, Dependent
       Care, and Vehicles
 8     Annual Income and Retirement Accounts, Taxes, School Enrollment and Financing
                                                                                                (table continues)


                                                     5-7
SIPP USERS’ GUIDE

                  Table 5-3. Topical Modules, by Panel and Wave (continued)

Wave   Subject Areas
                                                   1989 Panel
 1     No Topical Modules
 2     Recipiency History, Employment History, Work Disability History, Education and Training History,
       Marital History, Migration History, Fertility History, Household Relationships
 3     Work Schedule, Child Care, Child Support Agreements, Support for Nonhousehold Members, Home
       Health Care, Disability Status and Utilization of Health Care Services, Functional Activities
 4     The 1989 Panel was terminated following Wave 3.
                                                   1988 Panel
 1     No Topical Modules
 2     Recipiency History, Employment History, Work Disability History, Education and Training History,
       Family Background, Marital History, Migration History, Fertility History, Household Relationships
 3     Work Schedule, Child Care, Child Support Agreements, Support for Nonhousehold Members, Long-Term
       Care, Disability Status of Children, Health Status and Utilization of Health Care Services
 4     Selected Financial Assets; Medical Expenses and Work Disability; Real Estate, Shelter Costs, Dependent
       Care, and Vehicles
 5     Annual Income and Retirement Accounts, Taxes, School Enrollment and Financing
 6     Work Schedule, Child Care, Child Support Agreements, Support for Nonhousehold Members, Home
       Health Care, Disability Status of Children, Health Status and Utilization of Health Care Services,
       Functional Activities
 7     No Wave 7
 8     No Wave 8
                                                  1987 Panel
 1     No Topical Modules
 2     Recipiency History, Employment History, Work Disability History, Education and Training History,
       Family Background, Marital History, Migration History, Fertility History, Household Relationships
 3     Child Care Arrangements/Child Support Agreements, Support for Nonhousehold Members, Work-Related
       Expenses, Shelter Costs/Energy Usage
 4     Assets and Liabilities, Real Estate Properties and Vehicles
 5     Annual Income and Retirement Accounts, Taxes, School Enrollment and Financing
 6     Work Schedule, Child Care, Child Support Agreements, Support for Nonhousehold Members, Long-Term
       Care, Disability Status of Children, Health Status and Utilization of Health Care Services
 7     Selected Financial Assets; Medical Expenses and Work Disability; Real Estate, Shelter Costs, Dependent
       Care, and Vehicles
 8     No Wave 8
                                                                                                  (table continues)


                                                      5-8
                                                                     FINDING SIPP INFORMATION

                 Table 5-3. Topical Modules, by Panel and Wave (continued)

Wave   Subject Areas
                                                1986 Panel
 1     No Topical Modules
 2     Recipiency History, Employment History, Work Disability History, Education and Training History,
       Family Background, Marital History, Migration History, Fertility History, Household Relationships
 3     Child Care Arrangements/Child Support Agreements, Support for Nonhousehold Members, Job Offers,
       Health Status and Utilization of Health Care Services, Long-Term Care, Disability Status of Children
 4     Assets and Liabilities, Retirement Expectations and Pension Plan Coverage, Real Estate Property and
       Vehicles
 5     Annual Income and Retirement Accounts, Taxes, School Enrollment and Financing
 6     Child Care Arrangements/Child Support Agreements, Support for Nonhousehold Members, Work-Related
       Expenses, Shelter Costs/Energy Usage
 7     Assets and Liabilities, Pension Plan Coverage, Real Estate Property and Vehicles
 8     No Wave 8
                                                   1985 Panel
 1     No Topical Modules
 2     No Topical Modules
 3     Assets and Liabilities, Real Estate Property and Vehicles
 4     Support for Nonhousehold Members/Work-Related Expenses, Marital History, Migration History, Fertility
       History, Household Relationships
 5     Annual Income and Retirement Accounts, Taxes, School Enrollment and Financing
 6     Child Care Arrangements/Child Support Agreements, Support for Nonhousehold Members, Job Offers,
       Health Status and Utilization of Health Care Services, Long-Term Care, Disability Status of Children
 7     Assets and Liabilities, Retirement Expectations and Pension Plan Coverage, Real Estate Property and
       Vehicles
 8     Annual Income and Retirement Accounts, Taxes, School Enrollment and Financing
                                                  1984 Panel
 1     No Topical Modules
 2     No Topical Modules
 3     Education and Work History, Health and Disability
 4     Assets and Liabilities; Retirement and Pension Coverage; Housing Costs, Conditions, and Energy Usage
 5     Child Care, Welfare History and Child Support, Reasons for Not Working/Reservation Wage, Support for
       Nonhousehold Members/Work-Related Expenses
 6     Earnings and Benefits, Property Income and Taxes, Education and Training
 7     Assets and Liabilities, Pension Plan Coverage, Real Estate Property and Vehicles
 8     Support for Nonhousehold Members/Work-Related Expenses, Marital History, Migration History, Fertility
       History, Household Relationships
 9     Annual Income and Retirement Accounts, Taxes, School Enrollment and Financing


                                                   5-9
SIPP USERS’ GUIDE

                                 Table 5-4. Topical Modules, by Subject

Subject Areas                                    Panel and Wavea
Marital History                                  84-8, 85-4, 86-2, 87-2, 88-2, 89-2, 90-2, 91-2, 92-2, 93-2, 96-2
Fertility History                                84-8, 85-4, 86-2, 87-2, 88-2, 89-2, 90-2, 91-2, 92-2, 93-2, 96-2
Household Relationships                          84-8, 85-4, 86-2, 87-2, 88-2, 89-2, 90-2, 91-2, 92-2, 93-2, 96-2
Migration History                                84-8, 85-4, 86-2, 87-2, 88-2, 89-2, 90-2, 91-2, 92-2, 93-2, 96-2
Family Background                                86-2, 87-2, 88-2
Annual Income and Retirement Accounts            84-9, 85-5, 85-8, 86-5, 87-5, 88-5, 90-5, 90-8, 91-5, 91-8, 92-5,
                                                 92-8, 93-5, 93-8, 96-4, 96-7, 96-10
Taxes                                            84-9, 85-5, 85-8, 86-5, 87-5, 88-5, 90-5, 90-8, 91-5, 91-8, 92-5,
                                                 92-8, 93-5, 93-8, 96-4, 96-7, 96-10
Assets and Liabilities                           84-4, 84-7, 85-3, 85-7, 86-4, 86-7, 87-4, 90-4, 91-7, 92-4, 93-7,
                                                 96-3, 96-6, 96-9, 96-12
Selected Financial Assets                        87-7, 88-4, 90-7, 91-4, 92-7, 93-4
Retirement Expectations and Pension Plan         84-4, 85-7, 86-4, 86-7, 87-4, 90-4, 91-7, 92-4, 93-9, 96-7
Coverage
Pension Plan Coverage                           84-7, 86-8
Earnings and Benefits                           84-6
Recipiency History                              86-2, 87-2, 88-2, 89-2, 90-2, 91-2, 92-1, 93-1, 96-1
Child Support Agreements                        85-6, 86-3, 86-6, 87-3, 87-6, 88-3, 88-6, 89-3, 90-3, 90-6, 91-3,
                                                92-6, 92-9, 93-3, 93-6, 93-9, 96-5, 96-11
Child Support Paid                              96-3, 96-6, 96-9, 96-12
Child Care                                      84-5, 85-6, 86-3, 86-6, 87-3, 87-6, 88-3, 88-6, 89-3, 90-3, 90-6,
                                                91-3, 92-6, 92-9, 93-3, 93-6, 93-9, 96-4, 96-10
Support for Nonhousehold Members                84-3, 84-5, 84-8, 85-4, 85-6, 86-3, 86-6, 87-3, 87-6, 88-3, 88-6,
                                                90-3, 90-6, 91-3, 92-6, 92-9, 93-3, 93-6, 93-9, 96-5
Welfare History and Child Support               84-5
Real Estate Property and Vehicles               84-7, 85-3, 85-7, 86-4, 86-7, 87-4, 90-4, 91-7, 92-4, 93-7
Real Estate, Shelter Costs, Dependent Care, and 87-7, 88-4, 90-7, 91-4, 92-7, 93-4, 93-7
Vehicles
Shelter Costs/Energy Usage                      86-6, 87-3
Property Income and Taxes                       84-6
Housing Costs, Conditions, and Energy Usage     84-4
Employment History                              86-2, 87-2, 88-2, 89-2, 90-2, 91-2, 92-1, 93-1, 96-1
WorkDisability History                         86-2, 87-2, 88-2, 89-2, 90-2, 91-2, 92-2, 93-2, 96-2
Work Schedule                                   87-6, 88-3, 88-6, 89-3, 90-3, 91-3, 92-6, 92-9, 93-3, 93-6, 93-9,
                                                96-4, 96-10
Work-Related Expenses                           84-5, 84-8, 85-4, 86-6, 87-3, 96-3, 96-6, 96-9, 96-12
Reasons for not Working/Reservation Wage        84-5
Time Spent Outside Work Force                   90-6
Job Offers                                      85-6, 86-3
Home-Based Self-Employment/Size of Firm         92-6, 93-3
Education and Training History                  86-2, 87-2, 88-2, 89-2, 90-2, 91-2, 92-2, 93-2, 96-2
Education and Work History                      84-3
School Enrollment and Financing                 84-9, 85-5, 85-8, 86-5, 87-5, 88-5, 90-5, 90-8, 91-5, 91-8, 92-5,
                                                92-8, 93-5, 93-8, 96-5
Education and Training                          84-6
Functional Limitations and Disability           90-3, 90-6, 91-3, 92-6, 93-3
                                                                                                   (table continues)


                                                      5-10
                                                                       FINDING SIPP INFORMATION

                        Table 5-4. Topical Modules, by Subject (continued)

Subject Areas                                     Panel and Wavea
Functional Limitations and DisabilityAdults      92-9, 93-6, 96-5, 96-11
Functional Limitations and Disability            92-9, 93-6, 96-5, 96-11
Children
Disability Status of Children                     85-6, 86-3, 87-6, 88-3, 88-6, 89-3
Functional Activities                             88-6, 89-3
Medical Expenses and Work Disability              87-7, 88-4, 90-7, 91-4, 92-7, 93-4, 93-7
Utilization of Health Care Services               90-3, 90-6, 91-3, 92-6, 93-3
Utilization of Health Care ServicesAdults        92-9, 93-6, 96-5, 96-12
Utilization of Health Care ServicesChildren      92-9, 93-6, 96-5, 96-12
Health Status and Utilization of Health Care      85-6, 86-3, 87-6, 88-3, 88-6, 89-3
Services
Long-Term Care                                    85-6, 86-3, 87-6, 88-3
Home Health Care                                  88-6, 89-3
Health and Disability                             84-3
Employer-Provided Health Benefits                 96-5
Disability Questions                              96-4
Extended Measure of Well-Being (Consumer          91-6, 92-3
Durables, Living Conditions, Basic Needs)
Adult Well-Being                                  96-8
Basic Needs                                       93-9
Welfare Reform                                    96-8
Children’s Well-Being                             92-9, 93-6, 93-9, 96-6, 96-11
a
  The number preceding the hyphen indicates the year of the panel, and the number following the hyphen indicates
the wave number. Thus, 84-8 denotes that the information was collected in the 1984 Panel, during Wave 8.


                      Table 5-5. Structure of Topical Module Microdata File

                                         Interview Status                                 Topical Module
SUIDa               Person               in Interview Month          Core Vars            Vars
1                   1                    Yes
                    2                    Yes
                    3                    No                          Missing              Missing
                    4                    Yes
                    5                    No                          Missing              Missing
2                   1                    Yes
                    2                    Yes
3                   1                    Yes
4                   1                    Yes
                    2                    No                          Missing              Missing
                    3                    Yes
5                   1                    Yes
                    2                    Yes
                    3                    Yes
a
  Sample unit ID number. Chapter 4 provides more information about identification numbers in SIPP.


                                                     5-11
SIPP USERS’ GUIDE

Full and Partial Panel Files

At the conclusion of each panel, the Census Bureau creates a single full panel file containing all
data from the core wave files for every person who was a member of the SIPP sample at any
time during the life of that panel.2 To date, the full panel files have been issued in a format that
contains one record for each person. That record contains either data or missing value codes for
most core questionnaire items for every month of the panel.3 Chapter 3 discusses survey content,
including information about the content of the core questionnaire. At the time that this Guide was
written, full panel files had been issued for all SIPP panels prior to the 1996 Panel. Because of
the extended (4-year) duration of the 1996 Panel, the Census Bureau is modifying its procedures
for releasing information for the full panel.


Sources for Obtaining SIPP Microdata
SIPP microdata files can be obtained from several sources. All public use microdata files can be
obtained on magnetic media or CD-ROM directly from the Census Bureau. When microdata files
are obtained directly from the Census Bureau, users are provided with a full set of documentation
for those files, including all currently available applicable User Notes (discussed later in this
chapter). Users can also be placed on a distribution list to receive information from the Census
Bureau regarding any errors found in, or revisions made to, those files, by contacting the
Customer Services Branch, Administrative and Customer Services Division, at (301) 457-4100.

In addition, analysts affiliated with institutions that are members of the Inter-university
Consortium for Political and Social Research (ICPSR) can obtain all SIPP microdata from that
source. Users should contact the ICPSR representative at their institutions for more information.
Finally, SIPP data and documentation, as released by the Census Bureau, are not copyrighted.
The data files and supporting documentation can therefore be freely copied and distributed to
other users.4

There is another source of SIPP data that can be quite useful for simple exploratory work. SIPP
microdata are available on-line at the Census Bureau’s Web site (http://www.census.gov/) and
from the SIPP Web site (http://www.sipp.census.gov/sipp/). Those Internet sites offer two data
access tools—Surveys-on-Call, which is part of the Data Extraction System (DES), and FERRET,
which is part of the new Census Bureau Data Access and Dissemination System (DADS).

Surveys-on-Call provides access to SIPP longitudinal files for the 1988 through 1993 Panels and
for wave and topical module files for the 1990 through 1993 Panels. Surveys-on-Call allows
users to define microdata extracts from the SIPP public use microdata files. Users can choose
2
  Because of the volume of data collected in the 1996 Panel, that procedure may not occur for the 1996 full panel
file.
3
  In the case of items that are asked only once per interview rather than for each month of the 4-month reference
period, there is a field for each interview rather than for each month.
4
  This provision pertains only to materials authored and distributed by the Census Bureau or other federal agencies.
It does not imply any rights to copy and distribute material published by any other party.


                                                      5-12
                                                                          FINDING SIPP INFORMATION

data for selected years, wave files, core files, topical module files, or longitudinal files. They can
also select variables of interest and use variables as selection criteria. For example, an analyst
might want to extract recipiency information for females between the ages of 18 and 25 from
Wave 5 of the 1993 Panel. Once defined, analysts can download those extracts to their own
computers for analysis. Surveys-on-Call creates microdata extracts from the SIPP public use files
only. It does not include any options for performing analyses on-line. On-line help is available at
each step of the data-extraction process. Users are encouraged to explore the capabilities of this
system by creating several small extracts.

SIPP data available on the Federal Electronic Research Review and Extraction Tool (FERRET)
include files from the 1996 Panel and the longitudinal files from the 1992 and 1993 Panels.
FERRET is the product of a joint project of the U.S. Census Bureau and the Bureau of Labor
Statistics. It is a system enabling users to access and manipulate large demographic and
economic data sets on-line. FERRET is designed to aid not only sophisticated researchers, but
also reporters, students, government policy makers, and amateur statisticians. SIPP is one of
several surveys available through FERRET.5


Other Sources of Information About SIPP
Other sources of information about SIPP include the SIPP Quality Profile, User Notes, and SIPP
working papers. The SIPP Web site includes an extensive bibliography that provides references
to SIPP-related research and documentation, data dictionaries, variable metadata documenting all
information relevant to variables that appear on the public use microdata files, and a computer-
based tutorial that introduces users to methods and concepts needed to use SIPP data.


SIPP Quality Profile

The SIPP Quality Profile documents data quality issues related to SIPP. It summarizes what is
known about the sources and magnitude of errors in estimates based on SIPP. The SIPP Quality
Profile covers both sampling and nonsampling error, with an emphasis on nonsampling error.
There have been three editions of the SIPP Quality Profile. The third edition, by Kalton,
Winglee, & Jabine (U.S. Census Bureau, 1998a), updates the two previous editions, by King,
Petroni, & Singh (U.S. Census Bureau, 1987) and Jabine, King, & Petroni (U.S. Census Bureau,
1990). The third edition of the SIPP Quality Profile is available on-line at the SIPP Web site.


5
  Among the current and future topics accessible through FERRET are employment, health care, education, race and
ethnicity, health insurance, housing, income and poverty, aging, marriage, and the family. FERRET allows users to
quickly locate current and historical information from survey sources, get tabulations for specific information they
need, make comparisons between different data sets, create simple tables, and download large amounts of data to
desktop and larger computers for custom reports.


                                                      5-13
SIPP USERS’ GUIDE


SIPP User Notes

The SIPP User Notes, issued periodically by the Census Bureau, contain updated information for
specific microdata files. The User Notes include corrections to the data dictionaries,
announcements of errors found in the public use data files after their release, and recommended
corrections for those data errors. Analysts obtaining SIPP microdata files directly from the
Census Bureau will receive all User Notes that have been issued for those files at the time of
purchase. Users who obtained files from other sources should contact the Customer Services
Branch, Administrative and Customer Services Division, at (301) 457-4100, to request the User
Notes that have been issued for the data they plan to use. User Notes are also available at the
SIPP Web site (http://www.sipp.census.gov/sipp/).


Microdata Technical Documentation

Users purchasing SIPP microdata files directly from the Census Bureau receive, along with the
data files, a package of technical documentation. The technical documentation includes:

!   A data dictionary, containing information about the file structure and the names, locations,
    and contents of all variables. The printed version of the data dictionary also includes
    information about the structure of the machine-readable data dictionary supplied with each
    file.
!   A source and accuracy statement, containing detailed information about sample weights and
    computation of standard errors using Census Bureau generalized variance procedures. This
    information is specific to the panel, wave, and content of the data file. For example, the
    topical module file and the core wave file for Wave 7 of the 1990 Panel have different source
    and accuracy statements.
!   A copy of the questionnaire screens and program code used to collect the information
    contained in the microdata file for the computer-assisted interviews for the 1996 Panel,
    which is available from the SIPP Web site (Chapter 2).


SIPP Working Papers

The Census Bureau publishes a series of SIPP working papers. Those papers are written by
authors inside the Census Bureau and by outside analysts. The series includes research papers
based on SIPP data or related to the SIPP program. SIPP working papers can be obtained from
the SIPP Web site (http://www.sipp.census.gov/sipp/) or ordered from the Customer Services
Branch, Administrative and Customer Services Division, at (301) 457-4100.


                                              5-14
                                                               FINDING SIPP INFORMATION


Bibliography

A bibliography of works related to SIPP is available on-line from the SIPP Web site. This
relatively comprehensive bibliography contains references for journal articles, research papers,
and working papers that use SIPP data or that discuss the SIPP survey.


Variable Metadata

Variable metadata, available in the data dictionary, provide a complete characterization of a
variable’s content. Variable metadata include all information relevant to variables that appear in
the SIPP public use microdata files, including the variable name, a description of the variable,
the concept label, data type (binary or character), suggested weight variable when applicable,
descriptions of all possible values, and other data when applicable. A variable summary will be
included for each public use variable. The summary identifies all edits, recodes, and imputations
that affect the final edited output variable.


What’s Available from the Survey of Income and Program
Participation?

What’s Available from the Survey of Income and Program Participation?, published by the
Census Bureau, provides a complete directory of available SIPP data and publications. The
directory lists materials available in both print and electronic formats. What’s Available includes
a listing of SIPP working papers, User Notes, public use microdata files, P-70 series population
reports, and compilations of relevant papers published in the proceedings from the annual
meetings of the American Statistical Association (ASA). What’s Available from the Survey of
Income and Program Participation? is updated periodically. Users can review the most recent
edition at the Census Bureau Web site.

Table 5-6 lists telephone numbers to call for obtaining additional information about specific
aspects of SIPP.


                                              5-15
SIPP USERS’ GUIDE

                     Table 5-6. Telephone Numbers for Information About
                                    Specific Aspects of SIPP

         Subject Fields                                 Telephone Number
         Adult well-being                               (301) 763-2464
         Child care                                     (301) 763-2416
         Child well-being                               (301) 763-2416
         Education                                      (301) 763-2464
         Fertility                                      (301) 763-2416
         Health insurance                               (301) 763-3213
         Income                                         (301) 763-3243
         Labor force, employment, and earnings          (301) 763-3230
         Marriage and family                            (301) 763-2416
         Migration                                      (301) 763-2454
         Pensions                                       (301) 763-3230
         Poverty                                        (301) 763-3213
         Wealth (assets)                                (301) 763-3230
         Women                                          (301) 763-2378
         Methodology                                    Telephone Number
         Data collection procedures                     (301) 763-3819
         Questionnaire design                           (301) 763-3819
         Estimation and weighting                       (301) 763-6445
         Nonsampling and sampling errors                (301) 457-4192
         Survey design                                  (301) 457-4192


                                                 5-16
6. Nonsampling Errors
This chapter summarizes information about nonsampling errors in the Survey of Income and
Program Participation (SIPP) that may affect the results of certain types of analyses. All surveys
are subject to various sources of nonsampling errors, and SIPP is no exception. Nonsampling
errors in SIPP include those that are found in most surveys as well as errors that arise because of
SIPP’s panel nature. The chapter focuses on the extent of nonsampling errors in SIPP and the
impact of those errors on some survey estimates. The following topics are discussed:

!   Undercoverage;
!   Nonresponse;
!   Measurement errors; and
!   Effects of nonsampling errors on some survey estimates.


Undercoverage
One source of error in SIPP, as in other household surveys, is differential undercoverage of
demographic subgroups. Black males over 15 years of age are most affected by undercoverage.
The coverage ratio for this subgroup was about 0.82 in the 1990 and 1991 SIPP Panels.
(Coverage ratio is computed as the survey estimate of the number in the subgroup before post-
stratification, divided by a population estimate for the subgroup from population projections
based on the most recent census.) For black males in their mid to late 20s, the coverage ratio was
lower, about 0.65 in the same panels (SIPP Quality Profile, 3rd Ed. [U.S. Census Bureau, 1998a,
Chapter 3]; hereinafter in this chapter, SIPP Quality Profile, 3rd Ed). These coverage ratios may
understate the magnitude of the coverage problems because census undercounts are not reflected
in the coverage ratios before 1992. Undercoverage in household surveys is attributed mainly to
within-household omissions; the omission of entire households is less frequent. Shapiro et al.
(1993) estimated that about 70 percent of the undercoverage for young black males consists of
within-household omissions; the corresponding percentage for the white population is about 60
percent. To compensate for undercoverage, the Census Bureau uses population controls to adjust
SIPP weights. Little is known about the effectiveness of the adjustments in reducing biases.


Nonresponse
Nonresponse is a major concern in SIPP because of the need to follow the same people over
time. In SIPP, nonresponse can occur at several levels: household nonresponse at the first wave
and thereafter; person nonresponse in interviewed households; and item nonresponse, including


                                               6-1
SIPP USERS’ GUIDE

complete nonresponse to topical modules. At the household level, the rate of sample loss for the
1991 Panel rose from about 8 percent at Wave 1 to more than 21 percent by Wave 8. For the
same panel, 23 percent of the original sample persons who participated in Wave 1 missed one or
more interviews for which they were eligible in later waves. At the item level, the nonresponse
rate is typically around 10 percent or less for items on income amounts but somewhat higher for
items on asset amounts. Nonresponse reduces the effective sample size (and, therefore, increases
sampling error) and introduces bias in the survey estimates. The Census Bureau uses a
combination of weighting and imputation methods to reduce the biasing effects of nonresponse
at all three levels in SIPP. The effectiveness of those procedures remains a matter of ongoing
review and research (SIPP Quality Profile, 3rd Ed., Chapters 4, 5, and 8).


Measurement Errors
Measurement errors are associated with the data collection phase of the survey. They may vary
across SIPP panels because of changes in data collection procedures over the years. Most core
survey items in SIPP are used consistently at every panel, although there have been occasional
changes to improve the clarity of some items. The data collection method, which was face-to-
face interviewing for the early panels, was changed to a maximum use of telephone interviewing
in February 1992. Telephone interviewing was used as the primary mode of data collection
between February 1992 and January 1996 for all waves except Waves 1, 2, and 6, for which
face-to-face interviewing was used. The switch to telephone interviewing has had no known
adverse effects on data quality.

Computer-assisted interviewing (CAI) was introduced with the 1996 SIPP Panel. The effects of
CAI on survey responses have yet to be determined (SIPP Quality Profile, 3rd Ed., Section
11.3). For the 1996 Panel, computer-assisted personal interviewing (CAPI) was used for Waves
1 and 2. After Wave 2, the field representatives used the CAI instrument in face-to-face
interviews with approximately one-third of the respondents; for the remaining interviews, the
field representatives used the CAI instrument but conducted telephone interviews from their
homes.

The combination of face-to-face interviews and telephone interviews used across waves is
prespecified and varies for different subgroups of the sample according to the following scheme
(Waite, 1996). Sample members are assigned to one of three interviewing mode subgroups. For
each subgroup, a pattern of interviewing modes is designated and repeated every three waves.
Thus, for Waves 3, 4, and 5, subgroup 1 is assigned the sequence face-to-face, telephone,
telephone; subgroup 2, the sequence telephone, face-to-face, telephone; and subgroup 3, the
sequence telephone, telephone, face-to-face. Under this scheme, which is applied with each
rotation group, one-third of the sample is interviewed in person each wave and each month, and
every household is interviewed in person once a year. The same sequence is repeated for Waves
6 and beyond, with a cycle of three waves (SIPP Quality Profile, 3rd Ed.).

Response errors in SIPP include errors of recall, errors in proxy respondents’ reports, and other
errors associated with the panel nature of SIPP. SIPP uses a 4-month recall period to reduce


                                              6-2
                                                                    NONSAMPLING ERRORS

memory error, and respondents are encouraged to use financial records and an event calendar to
facilitate recall. Although the level of accuracy for self-response is generally believed to be
higher than for proxy response (see Moore, 1988, for a contrary view), achieving a higher
proportion of self-response would increase data collection costs and might lead to some increase
in person nonresponse rates (SIPP Quality Profile, 3rd Ed., Section 4.5.3).

A potential source of response error that arises from the panel nature of SIPP is the time-in-
sample effect (or panel conditioning). This effect occurs when the responses given at later waves
are affected by the respondents’ experiences of being interviewed in previous waves. The extent
of this error is difficult to evaluate because it is often confounded with other sources of error,
particularly attrition. Thus far, studies have found little evidence of systematic biases resulting
from time-in-sample effects (Pennell and Lepkowski, 1992; McCormick et al., 1992).

Measurement errors can also occur when respondents misinterpret questions. For example, when
asked about earnings, some respondents may have reported take-home pay instead of gross
earnings. There is also some evidence of confusion in regard to welfare programs, such as the old
Aid to Families with Dependent Children and general assistance programs.

Another response error identified through the panel nature of SIPP is the seam phenomenon.
Research has consistently indicated that respondents tend to report the same status (e.g.,
employment or program participation) and the same amounts (e.g., Social Security income) for
all 4 months within a wave, with most reported changes occurring between the last month of one
wave and the first month of the subsequent wave. This phenomenon results in an overstatement
of changes at the on-seam months (the boundary between interviews in successive waves of a
panel) and an understatement of changes at the off-seam months. The seam phenomenon affects
most variables for which monthly data are collected. As a result of the rotation group pattern, the
phenomenon has relatively small effects on cross-sectional estimates based on all four rotation
groups. That is because there is only one rotation group (or one-fourth of the sample) that is on
seam and three rotation groups off seam for any given pair of calendar months. The effects of the
seam phenomenon on longitudinal estimates are not well known (SIPP Quality Profile, 3rd Ed.,
Chapter 6).


Effects of Nonsampling Error on Survey
Estimates
A considerable amount of research has been conducted to investigate the various sources of
nonsampling error in SIPP. The results of the research are summarized in the SIPP Quality
Profile, 3rd Ed.). The research includes, for example, the SIPP Record Check Studies (Marquis
and Moore, 1989a,b, 1990; Marquis et al., 1990) that compared SIPP responses on program
participation with administrative records. Despite the volume of this methodological research, it
remains difficult to quantify the combined effects of nonsampling errors on SIPP estimates. The
problem is made more complex because the effects of nonsampling error of different types on
survey estimates vary, depending on the estimate under consideration. There are, however, some


                                               6-3
SIPP USERS’ GUIDE

findings about nonsampling error that SIPP users should bear in mind when conducting their
analyses and examining their results. Those findings include the following:

!   Some demographic subgroups are underrepresented in SIPP because of undercoverage and
    nonresponse. They include young black males, metropolitan residents, renters, people who
    changed addresses during a panel (movers), and people who were divorced, separated, or
    widowed. The Census Bureau uses weighting adjustments and imputation to correct the
    underrepresentation. Those procedures, however, may not fully correct for all potential biases
    (SIPP Quality Profile, 3rd Ed., Chapter 8).
!   The SIPP estimates of income from Social Security, Railroad Retirement, and Supplemental
    Security programs represent more than 95 percent of the amounts reported by administrative
    sources. The SIPP estimates of unemployment income, workers’ compensation income,
    veteran’s income, and public assistance income, however, are low relative to the amounts
    reported by administrative sources (Coder and Scoon-Rogers, 1996).
!   Evaluation studies typically find that SIPP estimates (as well as other survey estimates) of
    property income are generally poor. Among the different types of property income, reports of
    interest and dividend income are most prone to error. Respondents are often confused about
    those two sources of income, and both sources tend to be underreported (Coder and Scoon-
    Rogers, 1996).
!   SIPP estimates of assets, liabilities, and wealth are low relative to estimates from the Federal
    Reserve Board (Eargle, 1990).
!   For SIPP panels before 1996, the estimates of the percentages of people in poverty were
    lower than those found in the Current Population Survey (CPS) (Shea, 1995a).
!   SIPP estimates of the working population differ from those produced from CPS. The
    differences may be explained largely by substantial conceptual and operational differences in
    the collection of labor force data in the two surveys (SIPP Quality Profile, 3rd Ed., Chapter
    10).
!   The SIPP estimates of people without any health insurance coverage are much lower than the
    CPS estimates. There are reasons to believe that the SIPP estimates are more accurate
    (McNeil, 1988).
!   The SIPP estimates of the number of births compare favorably with the CPS estimates. Both
    surveys, however, provide estimates that are low relative to the records from the National
    Center for Health Statistics (NCHS). The SIPP estimates of the number of marriages are
    fairly comparable with the NCHS counts, but the SIPP estimates of the number of divorces
    are consistently lower than the NCHS estimates (SIPP Quality Profile, 3rd Ed., Chapter 10).
In spell analyses, Kalton et al. (1992) found that spell durations of multiples of 4 months (e.g., 4
months, 8 months, 12 months) were particularly common, a feature that can be explained by the
seam phenomenon.


                                                6-4
7. Sampling Error
This chapter discusses methods for obtaining the sampling error estimates derived from the
Survey of Income and Program Participation (SIPP) panels. The sample selected for each SIPP
panel is a stratified multistage probability sample. This complex sample design needs to be taken
into account when estimating the variances of SIPP estimates. The SIPP data files contain
variables, related to the sample design, that are created for the purpose of variance estimation.
Several software packages are now available for computing variance estimates for a wide range
of statistics based on complex sample designs. Using the variables that specify the design, these
programs can calculate appropriate variances of survey estimates. The Census Bureau also
provides generalized variance functions (GVFs) that can be used to obtain approximate estimates
of sampling variance for SIPP estimates.

A common mistake in the estimation of sampling error for survey estimates is to ignore the
complex survey design and treat the sample as a simple random sample (SRS) of the population.
That mistake occurs because most standard software packages for data analyses assume simple
random sampling for variance estimation. When applied to SIPP estimates, SRS formulas for
variances typically underestimate the true variances. This chapter describes how appropriate
variance estimates, which take into account the complex sample design, can be obtained for SIPP
estimates.

The topics discussed in this chapter are:

!   Direct variance estimation;
!   Approximate variance estimates obtained from GVFs; and
!   Variance estimation when some data are imputed.


Direct Variance Estimation
The primary sampling unit (PSU) plays a key role in variance estimation with a multistage
sample design. SIPP PSUs are mostly counties, groups of counties, or independent cities (SIPP
Quality Profile, 3rd Ed. [U.S. Census Bureau, 1998a, Chapter 3]), which are sampled with
probability proportional to size within strata. The PSUs are sampled without replacement so that
no PSU is selected more than once for the sample. Some PSUs are so large that they are included
in the sample with certainty. Because no sampling is involved, those PSUs are, in fact, not PSUs
but strata. The actual PSUs for those certainty selections are the enumeration districts and other
units selected within them.


                                               7-1
SIPP USERS’ GUIDE

Although the SIPP PSUs are selected without replacement (as is the case with most multistage
designs), for the purpose of variance estimation they are treated as if they were sampled with
replacement. The with-replacement assumption greatly facilitates variance estimation since it
means that variance estimates can be computed by taking into account only the PSUs and strata,
without the need to consider the complexities of the subsequent stages of sample selection. This
widely used simplifying assumption leads to an overestimation of variances, but the
overestimation is not great.

Several software packages are available for computing variances of a wide range of survey
estimates (e.g., means and proportions for the total sample and for subclasses, for differences in
means and proportions between subclasses, and for regression and logistic regression
coefficients) from complex sample designs. Many of these packages are listed on the Web:
http://www.fas.harvard.edu/~stats/survey-soft/survey-soft.html. Lepkowski and Bowles (1996)
examined eight of the packages.

These packages use a variety of methods for variance estimation. Some use an approach based
on a Taylor-series approximation, or linearization, method. Others use a replication method, such
as jackknife repeated replications or balanced repeated replications. Although some methods
have advantages in some situations, there is generally little to recommend one method over
another. The variance estimates they produce are not identical, but the differences are usually
small. See Wolter (1985) and Rust (1985) for discussions of these methods.


Variance Units and Variance Strata, 1990–1993 Panels

For the 1990–1993 SIPP Panels, the sample member record contains information concerning the
PSU and stratum within which the member was sampled. This information is needed as input for
all of the specialized software packages. The original PSU and strata codes are not included in
the SIPP public use data files, however, to avoid potential identification of small geographic
areas and sampled individuals. Instead, sets of PSUs are combined across strata to produce
variance units and variance strata, with two variance units in each variance stratum. Variance
units and variance strata may be treated as PSUs and strata for variance estimation purposes.
Their use does not give rise to any bias in the variance estimates. The variance estimates are
somewhat less precise, however, than those obtained from the use of the PSUs and strata that
have not been combined.

Under the complex sample design, the number of degrees of freedom for variance estimation
depends on the number of variance strata. The 1984 SIPP Panel consists of 142 variance units in
71 variance strata; the panels between 1985 and 1991 have 144 variance units and 72 variance
strata; and the 1992–1993 Panels have 198 variance units and 99 variance strata. As a rough
approximation, the number of degrees of freedom for a variance estimate is the number of
variance strata. Thus, for national estimates, the variance estimates have about 71 degrees of
freedom for the 1984 Panel, 72 degrees of freedom for the 1985–1991 Panels, and 99 degrees of
freedom for the 1992–1993 Panels. Regional estimates will have fewer degrees of freedom
because such estimates include only some of the variance strata.


                                               7-2
                                                                                  SAMPLING ERROR

Table 7-1 displays the variable names for the variance stratum and variance unit codes in the
SIPP core wave files and the SIPP full panel files. These codes can be employed as stratum and
PSU codes in any of the software packages for variance estimation with complex sample
designs.

   Table 7-1. Variance Stratum Code and Variance Unit Code in SIPP Files, 1990–1993

  Variable for Variance Estimation:            SIPP Core Wave File           SIPP Full Panel File
  Variance stratum code                        HSTRAT                        VARSTRAT
  Variance unit (or half-sample) code          HHSC                          HALFSAMP


Replication Weights for the 1996 Panel

Analysts should use Fay’s method for estimating variances for the 1996 SIPP Panel. Fay’s
method is a modified balanced repeated replication (BRR) method of variance estimation. The
difference between the basic BRR method and Fay’s method is that the BRR method uses
replicate factors of 0 and 2, whereas Fay’s method uses one factor, k, which is in the range (0, 1),
with the other factor equal to 2 – k. In Fay’s method, the introduction of the perturbation factor
(1 – k) allows the use of both halves of the sample. Thus, Fay’s method has the advantage that no
subset of the sample units in a particular classification will be totally excluded. The variance
formula for Fay’s method is
                                                              G

                                  Var(θ0) = {1/[G(1 – k) ]} ∑ (θi – θ0)2,
                                                         2
                                                                                                    (7-1)
                                                             i=1
where
          G = number of replicates;
        1 – k = perturbation factor;
           i = replicate i, i = 1 to G;
          θi = ith estimate of the parameter θ based on the observations included in the ith
               replicate;
          θ0 = survey estimate of the parameter θ based on the full sample.

The 1996 SIPP Panel uses 108 replicate weights, which are calculated on the basis of a
perturbation factor of 0.5 (k = 0.5). Inserting those values into Equation (7-1) results in the 1996
SIPP Panel variance formula of
                                                             108
                                  Var(θ0) = [1/(108 * 0.52)] ∑ (θi – θ0)2.
                                                             i=1


The Census Bureau used VPLX software to compute the replicate weights that are available
through FERRET.


                                                   7-3
SIPP USERS’ GUIDE


Using GVFs to Approximate Variance Estimates
The Census Bureau provides two forms for approximate variance estimation: GVFs and tables of
standard errors (the square root of the variance) for different estimated numbers and percentages.
The generalized estimates provide indications of the magnitude of the sampling error in the
survey estimates. They serve as convenient ways to summarize the sampling errors for a broad
variety of estimates.

The GVFs for SIPP were derived by modeling the standard error behavior of groups of estimates
with similar standard errors. The mathematical form of the function adopted is

                                         s = (ax2 + bx)1/2,                                  (7-2)

where s represents the standard error and x the value of an estimate. The parameters a and b are
derived on the basis of a selected group of estimates. They are updated annually and are included
in the source and accuracy statement that accompanies each SIPP data file for a panel. It is
essential to use the parameter estimates for a specific panel and to follow the instructions to
apply necessary adjustments to obtain the correct estimates for subgroups. Besides GVFs, the
Census Bureau provides summary tables of general standard errors. Those estimates are also
available in the source and accuracy statements. The following examples show how to use GVFs
to estimate the standard errors of estimated numbers and of sample means. The use of GVFs and
tables of standard errors is described in the source and accuracy statements for each panel.

Before looking at the examples, the user should note that the generalized variance estimates for
estimating the standard errors of other statistics may not be accurate for small subgroups. Using
the 1984 SIPP Panel, Bye and Gallicchio (1989) developed variance functions for participants of
Old-Age, Survivors, and Disability Insurance (OASDI) and Supplemental Security Income (SSI)
programs. They found that for estimates of less than 10 million, the generalized standard error
estimates provided by the Census Bureau were 1.20 to 1.75 times larger than those obtained from
the variance functions developed specifically for that subgroup.


Using GVFs for Standard Errors of Estimated Numbers

The approximate standard error, s, of an estimated number of persons (or households, and
families) can be obtained by the formula

                                         s = (ax2 + bx)1/2,                                  (7-3)

where a and b are the parameters associated with the estimate for the particular reference period,
and x is the weighted estimate. This equation is appropriate for the standard errors of estimated
numbers and should not be applied to estimates of dollar values.


                                               7-4
                                                                                 SAMPLING ERROR

Suppose that the number of households with monthly household income above $6,000 is
estimated from Wave 1 of the 1991 Panel to be 472,000. The approximate values of a and b from
Table 6 of the source and accuracy statement of the 1991 Panel are a = -0.0001005 and b =
9,286. Then, the standard error, s, of this estimated number is given by

                   s = [(–0.0001005 * 472,0002) + (9,286 * 472,000)]1/2 = 66,000.

The approximate 90 percent confidence interval for the estimated number can be computed as x
± 1.64 s, which ranges from 364,000 to 580,000. Therefore, a conclusion that the average
estimate derived from all possible samples lies within an interval computed in this way would be
correct for roughly 90 percent of all samples.


Using GVFs for the Standard Error of a Mean

A mean is defined here to be the average quantity of some characteristic (other than the number
of persons or households) per person or household. For example, a mean could be the average
monthly household income of females 25 to 54 years of age. The formula used to estimate the
standard error of a mean, x , is

                                                          b 2
                                               sx =         s ,                              (7-4)
                                                          y

where y is the size on which the estimate is based, s2 is the estimated population variance of the
characteristic, and b is the parameter associated with the particular type of characteristic.
Because of the approximations used in developing this formula, an estimate of the standard error
of the mean obtained from this formula will generally underestimate the true standard error.

The estimated population mean is computed with the formula

                                                      n
                                                   ∑ wi xi
                                               x = i =1
                                                      n
                                                               ,                             (7-5)
                                                      ∑ wi
                                                      i =1


and the estimated population variance can be computed as


                               s2 =   ∑ wi (xi − x )2     or   ∑ wi (xi − x )2               (7-6)
                                         ∑ wi                    ∑ wi − 1

with the use of standard software for weighted data. Suppose that, based on Wave 1 data of the
1991 Panel, the mean monthly cash household income for females aged 25 to 54 is $2,530, the
weighted number of females in this age range is y = 39,851,000, and the population variance is
estimated to be s2 = 3,159,887. When the appropriate b parameter of 7,514 from Table 6 of the


                                                   7-5
SIPP USERS’ GUIDE

source and accuracy statement for Panel 1991 is used, the estimated standard error of this mean
is

                          sx = [(7,514 * 3,159,887)/39,851,000]1/2 = $24.

Thus, the 90 percent confidence interval, computed as

                                           x ± 1.64sx ,

ranges from $2,491 to $2,569. Therefore, a conclusion that the average estimate derived from all
possible samples lies within an interval computed in this way would be correct for roughly 90
percent of all samples.


Variance Estimation with Imputed Data
Imputation methods are used to fill in several types of missing data in SIPP. They are used to
complete some item nonresponse, person-level nonresponse within households (Type Z
nonresponse), and some wave nonresponse (intermittent responses bounded by two responding
waves). Imputation fills in gaps in the data set and makes data analyses easier. It also allows
more people to be retained as panel members for longitudinal analyses. The concern, however, is
that imputation fabricates data to some degree. Treating the imputed values as actual values in
estimating the variance of survey estimates leads to an overstatement of the precision of the
estimates (Brick and Kalton, 1996). It is important to recognize this fact when sizable
proportions of values are imputed.


                                               7-6
8. Using Sampling Weights on
   SIPP Files
This chapter describes the use of sampling weights in analyzing data from the Survey of Income
and Program Participation (SIPP). Each SIPP file contains a number of alternative sets of
weights for use in data analysis. The several different sets of weights are needed to cater to the
different possible units of analysis (persons, households, families, and subfamilies) and different
time periods for which survey estimates may be required.

A common mistake in the analysis of a survey like SIPP is to ignore the weights entirely, that is,
to perform an unweighted analysis. This chapter explains why an unweighted analysis is likely to
produce biased estimates. It is important to understand the different sets of weights on the files
and to use the set that is appropriate for a particular analysis. Topics covered in this chapter
include:

l   What weights are and why they should be used;
l   What weights are available in SIPP files;
l   Which weights to use for a particular analysis;
l   How weights are constructed;
l   Using weights in the core wave files;
l   Using weights in the topical module files;
l   Using weights in the full panel files; and
l   Using weights in combined panel files.
For the 1996 Panel, most variable names changed from those used in previous panels. To aid
users working with files from panels prior to 1996, this chapter presents both the old and the new
variable names whenever a variable is mentioned. In both the main body of the text and in tables,
the old names are presented in parentheses following the new names. For example, the sample
unit ID variable name, which is SSUID in the 1996 Panel, was SUID in previous panels; it is
written in this chapter as SSUID (SUID).


What Weights Are and Why They Should Be Used

The weight for a responding unit in a survey data set is an estimate of the number of units in the
target population that the responding unit represents. In general, since population units may be
sampled with different selection probabilities and since response rates and coverage rates may


Throughout this chapter, pre-1996 variable names appear in parentheses following 1996 variable names.
                                                      8-1
SIPP USERS’ GUIDE

vary across subpopulations, different responding units represent different numbers of units in the
population. The use of weights in survey analysis compensates for this differential
representation, thus producing estimates that relate to the target population.

Most SIPP panels have not sampled different subpopulations at different rates (the exceptions are
the 1990 and 1996 Panels). However, there are some minor variations in sampling rates in all
SIPP panels and, more important, there are appreciable variations in response and coverage rates
across subpopulations. As a result, there is nontrivial variation in SIPP weights (see SIPP Quality
Profile, 3rd Ed. [U.S. Census Bureau, 1998a, Table 8.1]). For example, in Wave 1 of the 1993
Panel, the final person lower quartile weight is 4,400 and the upper quartile weight is 5,245 (the
maximum weight is 28,695). A respondent with a final person weight of 4,400 represents 4,400
people in the U.S. population for the reference month, whereas a respondent with a weight of
5,245 represents 5,245 people. Because weights in SIPP vary over a sufficiently large range of
values, performing unweighted analyses may produce appreciably biased estimates for the U.S.
population.

Table 8-1 illustrates the effects of weighting on a selection of estimates obtained from Wave 1 of
the 1990 Panel. The 1990 Panel included an oversample of households headed by blacks,
Hispanics, and females with no spouse present and living with relatives. Since those groups are
overrepresented in this sample, failure to use the weights would lead to overrepresentation of the
groups in the population estimates based on that sample. At the household level, the unweighted
percentage of households headed by females with no spouse present is 14.3 percent, whereas the
weighted estimate is 11.7 percent. At the person level, the magnitude of the differences between
weighted and unweighted estimates is less, but still appreciable.

        Table 8-1. Weighted and Unweighted Point -in-Time Estimates of Percentages
              Based on Core Wave 1 of the 1990 SIPP Panel for January 1990

                                                                                       Percentage
Characteristics                                                           Weighteda          Unweighted
Household-Level
Female -headed households with no spouse present, living with relatives   11.7                14.3
Person-Level
Female                                                                    51.3                52.2
Race/Ethnicity
White                                                                     84.2          82.1
Black                                                                     12.4          14.4
American Indian, Eskimo, or Aleut                                         0.6           0.6
Asian or Pacific Islanders                                                2.9           2.9
Age over 65 years                                                         10.4          10.6
Receiving Food Stamps [RCUTYP27 (FOODSTMP)]                               6.7           7.7
RCUTYP20 (AFDC)                                                           3.8           4.6
a
  Weighted by WPFINWGT (FNLWGT)—final weight for person—and               WHFNWGT (HWGT)—final weight for
households.


Throughout this chapter, pre-1996 variable names appear in parentheses following 1996 variable names.
                                                      8-2
                                                     USING SAMPLING WEIGHTS ON SIPP FILES


Weights Available in SIPP Files

Table 8-2 lists the weight variables in SIPP data files for the 1996 and 1990–1993 Panels. For
earlier panels, the user should refer to the data dictionary for the particular file.

         Table 8-2. Weight Variables in SIPP Files for the 1996 and 1990-1993 Panels

Variable Name                                Description
                                                  Core Wave Files
WPFINWGT (FNLWGT)                            Reference month, final weight of person
WHFNWGT (HWGT)                               Reference month, final weight of household
WFFINWGT (FWGT)                              Reference month, final weight of family
WSFINWGT (SWGT)                              Reference month, final weight of related subfamily
WPFINWGT (P5WGT)a                            Interview (5th) month, final weight of person
WHFNWGT (H5WGT) a                            Interview (5th) month, final weight of household
                                                Topical Module Files
WPFINWGT (FINALWGT)                          Prior to 1996: interview month, final weight of person. 1996+: 4th
                                             reference month, final weight of person
                                                  Full Panel Files b
WPFINWGT (FNLWGT)_x                          Calendar year x, final weight of people in the calendar year cohort
PNLWGT (Not kept for 1996 panel)             Final weight for people in full panel cohort
a
  Beginning with the 1996 Panel, SIPP files no longer include the interview month weights.
b
  The number of calendar year weights in the full panel file depends on the panel’s duration. The 1990 full panel file
contains two calendar year weights: WPFINWGT90 (FNLWGT90) and WPFINWGT91 (FNLWGT91). The 1992
full panel file has three calendar year weights: WPFINWGT92 (FNLWGT92), WPFINWGT93 (FNLWGT93), and
WPFINWGT94 (FNLWGT94). The 1996 full panel file will have four calendar year weights when it is complete.


Choosing a Weight

The decision of which weight to use for a given analysis depends on the population of interest
for that analysis. Useful guidance for choosing the correct set of weights is to consider to what
population the results are intended to apply.

The weights in the SIPP files are constructed for sample cohorts defined by:

l   Month (e.g., the reference month weights in the core wave files and interview month weights
    in the topical module files);
l   Year (e.g., the calendar year weights in the full panel file); and
l   Panel (e.g., the full panel weight in the full panel file).
Users can choose to base their analyses on:

l   A cross-sectional sample at a given month;
l   A longitudinal sample that provides continuous monthly data over a year;


Throughout this chapter, pre-1996 variable names appear in parentheses following 1996 variable names.
                                                        8-3
SIPP USERS’ GUIDE

l   A longitudinal sample that provides monthly data over the life of a panel (about 32 months,
    or 48 months with the 1996 Panel); or
l   A subset of the sample and/or the period in any of the above.
Monthly (cross-sectional) weights allow the use of all available data for a given month. For this
type of analysis, users can choose among the following units of analysis:

l   Person (e.g., WPFINWGT (FNLWGT));
l   Household (e.g., WHFNWGT (HWGT));
l   Family (e.g., WFFINWGT (FWGT)); and
l   Related subfamily (e.g., WSFINWGT (SWGT)).
Analysts can use longitudinal samples to follow the same people over time and hence study such
issues as the dynamics of program participation, lengths of poverty spells, and changes in other
circumstances (e.g., household composition). The longitudinal weights allow the inclusion of all
people for whom data were collected for every month of the period involved (calendar year or
full panel period), including those who left the target population through death or because they
moved to an ineligible address (institution, foreign living quarters, military barracks), as well as
those for whom data were imputed for missing months. The Census Bureau makes nonresponse
adjustments to the longitudinal weights to compensate for panel attrition and poststratification
adjustments to make the weighted sample totals conform to population totals for key variables.


How Weights Are Constructed

This section describes how the weights are constructed. The basic components for all the
different sets of weights are the same, namely:

l   A base weight that reflects the probability of selection for a sample unit;
l   An adjustment for subsampling within clusters;
l   An adjustment for movers (in Waves 2 and beyond);
l   A nonresponse adjustment to compensate for sample nonresponse; and
l   A poststratification (second-stage calibration) adjustment to correct for departures from
    known population totals.


Weights

Reference month final weights are provided on the SIPP core wave files for persons, households,
families, and subfamilies; interview month final weights are provided for persons and
households. The special weights for persons are constructed first. The household, family, and


Throughout this chapter, pre-1996 variable names appear in parentheses following 1996 variable names.
                                                      8-4
                                                    USING SAMPLING WEIGHTS ON SIPP FILES

related subfamily final weights are derived from the final person weights. This section
summarizes the steps involved in constructing the various sets of weights, starting with the final
person weights for a reference or interview month. Appendix C provides the technical details and
reasons for some of the adjustments.

The reference and interview month weights1 for people on the core wave files are computed (i.e.,
are nonzero) for all responding sample members who are “in scope” (i.e., a part of the survey’s
universe—the resident, noninstitutional population of the United States) in the specified month. 2
A number of factors lead to fluctuations in sample size from month to month. They include
births, deaths, immigration, and emigration from the population (and therefore from the sample).
In addition to those population dynamics, people move into and out of the sample as a result of
the changing household composition of sample members. (Chapter 2 describes the SIPP
“following rules.”)

In Wave 1, the weight for each sample person per month is a product of four components:

1. Wave 1 base weight. This weight is the inverse of the probability of a sample person’s
   address being selected.
2. Duplication-control factor. This factor adjusts for the occasional subsampling of clusters.
   Clusters are occasionally subsampled in the field when they turn out to be much larger than
   expected. 3
3. Wave 1 nonresponse adjustment. This adjustment compensates for different rates of
   household noninterview within adjustment classes. More than 500 nonresponse adjustment
   classes are defined based on a cross-classification of characteristics. Those characteristics
   include Census Region; MSA/Place Status (MSA-central city, MSA- non-central city, other
   place); race of reference person (black, nonblack); household tenure (owner, renter);
   household size (1, 2, 3, 4+ people). In addition, the within-primary-sampling- unit poverty
   stratum (high poverty, low poverty) was added for the 1996 Panel.
4. Wave 1 second-stage calibration. This adjustment brings the sample estimates into
   agreement with independent monthly estimates of population totals. The characteristics used
   for calibration include age, race, sex, Hispanic origin, family relationship, and household
   type. A raking procedure is used to ensure that the weights agree with all the control totals
   included for calibration. The adjustment is done by rotation group, with each group assigned
   one- fourth of the population total for the month.
In subsequent waves, each person receives an initial weight that is carried over from the
preceding wave. This weight is adjusted to compensate for changes in the sample between waves
resulting from movers and nonresponse, and then it is realigned to match the population totals for
the reference or interview month:


1
  Interview month weights were not computed for the 1996 Panel.
2
  Persons subjected to Type Z imputation receive weights, although they are not respondents.
3
  This adjustment has been used since Wave 5 of the 1984 Panel.


Throughout this chapter, pre-1996 variable names appear in parentheses following 1996 variable names.
                                                       8-5
SIPP USERS’ GUIDE

l     Wave 2+ initial weight. This is the weight from the previous wave before the second-stage
      calibration for each original sample person who is a reference person or is in group quarters
      for the current wave.
l     Wave 2+ mover’s adjustment. This adjustment is made to compensate for including people
      who were not in the original sample but were in the SIPP universe in Wave 1 and who moved
      into a sample household after Wave 1. For people in housing units that contain adult
      members who were not part of the original sample but were in the SIPP universe at Wave 1,
      the weights are decreased. For example, if a third adult moves into a household occupied by
      two original sample persons, all three adults would receive the initial weight of the original
      sample persons multiplied by a factor of two-thirds.
l     Wave 2+ nonresponse adjustment. The nonresponse adjustment for Waves 2 and beyond is
      used to compensate for household nonresponse after the first interview. The nonresponse
      adjustment classes are defined on the basis of sample unit characteristics and personal
      demographic characteristics 4 from the most recent wave. The information used consists of
      household characteristics. Reference person characteristics are used to define some of the
      household characteristics. Tenure (owner/renter occupied), househo ld type (female
      householder, no spouse present; 65+; other), race and Hispanic origin, and education level
      are defined at the household level by using reference person data. Other household
      characteristics include size, poverty status, type of income, type of financial assets, census
      division, and number of imputed items. Poverty threshold, census division, and number of
      imputed items are new to the 1996 Panel. Some adjustment classes are combined to ensure
      that the adjustment for each class does not exceed a factor of 2, and each class contains at
      least 30 unweighted sample households.
l     Wave 2+ second-stage calibration. To derive this adjustment, use the same procedure as in
      Wave 1; that is, use the appropriate population control totals by reference month.
The reference month final weights for households, families, and subfamilies are derived from the
person weights:

l     The household weight is the person weight of the household reference person (renter/owner
      of housing unit).
l     The family weight is the person weight of the family reference person.
l     The subfamily weight for a related subfamily is the person weight of the related subfamily
      reference person (Chapter 10 explains how to identify households, families, and subfamilies).
l     The interview month final household weight is the person weight of the household reference
      person in the interview month. (This weight does not apply to the 1996 Panel.)


4
    Known as the control card information before the 1996 Panel, when computer-assisted interviewing (CAI) began.


Throughout this chapter, pre-1996 variable names appear in parentheses following 1996 variable names.
                                                        8-6
                                                    USING SAMPLING WEIGHTS ON SIPP FILES


Final Full Panel and Calendar Year Weights

Final full panel and final calendar year weights are provided on the full panel files for eligible
sample members. There is one set of final panel weights and generally more than one set of
calendar year weights, one for each calendar year covered by the panel. The 1992 Panel file has
three sets of calendar year weights because that panel covered 3 calendar years. The 1996 Panel
file will have four sets of calendar year weights.

Final panel weights are computed only for people who are in the sample at Wave 1 of the panel
and for whom data are obtained (either reported or imputed) for every month of the panel for
which they were in scope for the survey. Other people in the panel file are assigned weights of
zero. Most people with nonzero final panel weights have provided data for all months of the
panel. However, people who missed a wave and whose missing wave data were imputed and
people who provided data up to the point that they left the survey (through death or because they
moved to an ineligible address) are also assigned nonzero final panel weights. (In core panels, it
also includes those missing up to two consecutive waves, if the waves are bounded.)

Final calendar year weights are computed only for people who had an interview covering the
control date 5 and for whom data are obtained (either reported or imputed) for every mont h of the
calendar year for which they were in scope for the survey. Other people are assigned final
calendar year weights of zero. Some people who joined the household of an original sample
person after the start of the panel are assigned nonzero calendar year weights for the second
calendar year, if data are obtained for that period.

The full panel weighting scheme does not assign weights to people who enter the sample
universe after Wave 1. Similarly, the calendar year weighting scheme does not assign weights to
people who do not have an interview covering the control date. This group consists of (a) people
who enter the sample universe after the first wave of interviewing for the calendar year and (b)
people who were in the sample universe in the first wave of interviewing in the calendar year but
did not have an interview covering the control date. For example, newborn infants and people
leaving institutions who are entering the sample universe after Wave 1 are assigned full panel
and calendar year 1 weights of zero. Note that the same people will receive positive calendar
year 2 (CY2) weights if they are in the sample universe in the first wave of interviewing for CY2
and have an interview covering the control date for CY2.

The final panel and calendar year weights are constructed from the following three components:

1. Initial weight. This weight is constructed from the components of the cross-sectional
   weights at the start of the panel and calendar year weighting periods before the second-stage
   calibration adjustment.


5
  The calendar year control dates are January 1 for the given calendar year. The exception is calendar year 1996 for
the 1996 Panel. Its control date is currently March 1, 1996. This would change to January 1 should there be
imputation for January and February data.


Throughout this chapter, pre-1996 variable names appear in parentheses following 1996 variable names.
                                                       8-7
SIPP USERS’ GUIDE

2. Nonresponse adjustment factors. These factors account for noninterviewed eligible sample
   persons not already accounted for in the noninterview adjustment component of the initial
   weight. The adjustment classes are similar to those used in the Wave 2+ nonresponse
   adjustment factors.
3. Second-stage calibration factors. These factors are determined by a process similar to that
   used for reference and interview month weighting. The control totals used for the calendar
   year weights are the population estimates for the control date of the relevant year. Those for
   the full panel weight are the population estimates for a designated date in the first wave of
   the panel (March 1 for most recent panels).


Using Weights in the Core Wave Files

Each core wave file contains reference month weights for persons, households, families, and
subfamilies and, prior to the 1996 Panel, interview month weights for persons and households
(interview month weights are not computed for families and related subfamilies).

In the 1989 and earlier panels, each person’s record in a core wave file contained 18 weight
variables, comprising weights for the four analysis units (persons, households, families, and
subfamilies) for each of the four reference months and the person or household weights for the
interview month. For the 1990 and later panels, the file structure was changed to a person- month
format, as described in Chapter 10. With that format, each person- month record has only six
weights, four for the four analysis units for that month and two for the two analysis units
(household and family/related subfamily) for the interview month.

This section describes those weights and indicates how they should be used for different types of
analysis.


Reference Month and Interview Month Weights

To understand the format of the reference month and interview month weights, analysts may find
it useful to recall the SIPP survey design and the file structure for the core wave file. The full
SIPP sample consists of four rotation g   roups; for each wave, interviewing is spread over 4
months. One rotation group is interviewed per month, with the reference months for each
rotation group being the 4 months preceding the interview month. As successive rotation groups
are interviewed, the 4- month reference periods advance by 1 month. Therefore, there are 4
interview months and 4 reference months per rotation group for each wave.

There are four final person reference month weights per sample person, one for each month in
the reference period. Beginning with the 1990 Panel, the reference month weights are provided
as one variable—that is, WPFINWGT (FNLWGT) for persons—in four separate person- month
records per person. The reference month weight on each record refers to the specific month to


Throughout this chapter, pre-1996 variable names appear in parentheses following 1996 variable names.
                                                      8-8
                                                   USING SAMPLING WEIGHTS ON SIPP FILES

which the data relate. The core wave files for earlier panels used one record per person. On those
files, the four reference month weights were shown as four separate variables.

The interview month weight for a particular rotation group represents one-quarter of the U.S.
population at the month of interview. The sum of the interview month weights for the four
rotation groups is an estimate of the total U.S. population across the 4 months of interviewing per
wave. The interview month weight can be used to form person or household estimates that
specifically refer to characteristics as of the interview month. For example, an analyst might
want to estimate the number of unmarried adults living with an aged parent as of the latest
observation. The interview month weight can also be used for estimating a few of the
demographic characteristics, such as race and sex, and other information that appears on the file
for the 4- month reference period as a whole, but not for each month.

Analysts should not use interview month weights to form estimates referring to the reference
period plus the interview month. That is because characteristics at the time of the interview date
are not necessarily representative of the rest of the reference period (i.e., people could move,
marry, or leave the country). Beginning with the 1996 Panel, the core wave file no longer
provides the interview month weight, since the focus of the data is the 4 calendar months prior to
that month.


Person Reference Month and Interview Month Weights

For person-level analyses, the weights available in the core wave file are WPFINWGT
(FNLWGT) (the reference month weight) and WPFINWGT (P5WGT) (the interview month
weight—not applicable to the 1996 Panel). WPFINWGT (FNLWGT) is the estimated number of
people in the population that the sample person represents in a specific reference month. The
reference month is given by the variables RHCALMN (MONTH) and RHCALYR (YEAR),
which are derived based on SROTATON (ROT) (rotation group) and SREFMON (REFMTH)
(reference month). The interview month weight WPFINWGT (P5WGT) is also called the fifth-
month weight. This weight shows the number of people in the population that the sample person
represents at the interview month.

Table 8-3 shows the reference months and interview month weights for two hypothetical sample
persons in Wave 1 of the 1991 Panel, based on the person- month format. The persons can be
identified by the variables SSUID (SUID), EENTAID (ENTRY), and EPPPNUM (PNUM)
(Chapter 10 describes how to identify a person). There are four records per person, one for each
reference month. The first four records are for the first person, who is from rotation group 2:
SROTATON = 2 (ROT = 2). Reference month 1, SREFMON = 1 (REFMTH = 1), corresponds
to October 1990 (MONTH and YEAR). WPFINWGT (FNLWGT) for SREFMON (REFMTH) =
1 is 5,000, meaning that this person represents 5,000 people in the population in October 1990.
The values of WPFINWGT (FNLWGT) in subsequent months are slightly different because of
adjustments to the weight resulting from fluctuations in the population and in the sample. The
second person is from rotation group 3. Since the month of interview for this person is different


Throughout this chapter, pre-1996 variable names appear in parentheses following 1996 variable names.
                                                      8-9
SIPP USERS’ GUIDE

  Table 8-3. Final Person Weights for Four Reference Months and One Interview Month
                              in Wave 1 of the 1991 Panel

                                                               RH           RH          WPFIN           WPFIN
SSUID       EENTAID     EPPPNUM     SROTATON      SREFMON      CALMN        CALYR       WGT             WGT
(SUID)      (ENTRY)     (PNUM)      (ROT)         (REFMTH)     (MONTH)      (YEAR)      (FNLWGT)        (P5WGT)
123456789   11          101         2             1            10           90          5,000           5,025
123456789   11          101         2             2            11           90          5,005           5,025
123456789   11          101         2             3            12           90          5,010           5,025
123456789   11          101         2             4            01           91          5,020           5,025
321456789   11          101         3             1            11           90          6,500           6,525
321456789   11          101         3             2            12           90          6,510           6,525
321456789   11          101         3             3            01           91          6,520           6,525
321456789   11          101         3             4            02           91          6,530           6,525


from that of the first person, the reference months for this person are also different. The variables
RHCALMN (MONTH) and RHCALYR (YEAR) can be used to select records with data for a
particular month.


Household Reference Month and Interview Month Weights

Households in the core wave file refer to a group of people who occupy a housing unit in a
specific calendar month. For each household, the household weight WHFNWGT (HWGT) is the
weight of the reference person (the renter/owner of a housing unit) of the household.
WHFNWGT (HWGT) shows the number of households in the population that the sample
household represents in that reference month. The household interview month weight
WHFNWGT (H5WGT) is the number of households in the population that the sample household
represents at the month of interview (which varies within a wave over a 4- month period). Note
that the household reference person can change from one month to the next, resulting in a change
of WHFNWGT (HWGT). WHFNWGT (HWGT) is assigned to all household members.

Table 8-4 shows WHFNWGT (HWGT) and WHFNWGT (H5WGT) for five members of a
household and their person weights. The variables SSUID (SUID) and SHHADID (ADDID)
identify the household (Chapter 10 describes how to identify households). The WHFNWGTs
(HWGTs) and WHFNWGTs (H5WGTs) for all members of a household are equal to the
WPFINWGTs (FNLWGTs) and WPFINWGTs (P5WGTs) of the reference person in the
household, respectively. In this case, the household reference person is the father. The user
should note that weights for husbands and wives are equalized in the weight process. Therefore,
couples (e.g., father and mother, daughter and son- in- law) have the same person weights.


Throughout this chapter, pre-1996 variable names appear in parentheses following 1996 variable names.
                                                     8-10
                                                   USING SAMPLING WEIGHTS ON SIPP FILES

      Table 8-4. Household, Reference Month, and Interview Month Weights for Members
                 of a Household for a Given Month in Wave 1 of the 1990 Panel

                                                               WHFN        WHFN        WPFIN        WPFIN
Household SSUID          SHHADID     EENTAID      EPPPNUM      WGT         WGT         WGT          WGT
Member      (SUID)       (ADDID)     (ENTRY)      (PNUM)       (HWGT)      (H5WGT)     (FNLWGT)     (P5WGT)
Fathera     101111103 11             11           101          5,000       5,050       5,000        5,050
Mother      101111103 11             11           102          5,000       5,050       5,000        5,050
Daughter    101111103 11             11           103          5,000       5,050       4,800        4,865
Son-in-law 101111103 11              11           104          5,000       5,050       4,800        4,865
Grandchild 101111103 11              11           105          5,000       5,050       3,000        3,035
Note: Month = 01; Year = 1990.
a
    Reference person of household.


Family and Related Subfamily Reference Month Weights

All sample persons in a core wave file are assigned a family type, EFTYPE (FTYP), consisting
of the following categories: primary families, unrelated subfamilies, primary i dividuals, and
                                                                                  n
secondary individuals. A family is defined as a group of two or more persons related by birth,
marriage, or adoption who reside together. A primary family is a family containing the
household reference person and all of his or her relatives. An unrelated subfamily is a family in a
household that is not related to the household reference person. A primary individual is a
household reference person who lives alone or lives with only nonrelatives. A secondary
individual is not a household reference person and is not related to any other people in the
household.

Related subfamily units within primary families are identified by ESFTYPE (STYPE) (0 = not in
a subfamily; 1 = in a related subfamily; 2 = in an unrelated subfamily). Related subfamilies are
families that are related to, but do not include, the household reference person. For example, the
daughter, son- in- law, and grandchild in Table 8-4 constitute a related subfamily within a primary
family. They are members of the father and mother’s primary family unit, as well as members of
their own subfamily.

The SIPP core wave files provide reference month weights for families and related subfamilies.
The family reference month weight WFFINWGT (FWGT) is equal to the person weight of the
family reference person in that month; it is assigned to all family members. The subfamily
reference month weight WSFINWGT (SWGT) is equal to the person weight of the related
subfamily reference person; it is assigned to all subfamily members and is set equal to zero for
people not in related subfamilies.

Primary individuals are the household reference persons and the family reference persons. For a
primary individual, WFFINWGT (FWGT) = WPFINWGT (FNLWGT) = WHFNWGT
(HWGT). Secondary individuals are classified as family reference persons who are not
household reference persons. Therefore, for secondary individuals, WFFINWGT (FWGT) =
WPFINWGT (FNLWGT) ? WHFNWGT (HWGT). The only exception is for people


Throughout this chapter, pre-1996 variable names appear in parentheses following 1996 variable names.
                                                     8-11
SIPP USERS’ GUIDE

in group quarters, RHTYPE = 6 (HTYPE = 6). The first secondary person in group quarters is
labeled the household reference person; for that person, WFFINWGT (FWGT) = WPFINWGT
(FNLWGT) = WHFNWGT (HWGT).

Table 8-5 shows the weights for the different analysis units by type of household, RHTYPE
(HTYPE), and by type of family, EFTYPE (FTYPE). Three households are shown. The first
household is a married couple family household, RHTYPE = 1 (HTYPE = 1), consisting of a
primary family and a related subfamily, ESFTYPE = 1 (STYPE = 1). The WHFNWGT (HWGT)
for each member of this hous ehold is equal to the person weight of the household reference
person (i.e., the father in this case). Members of this household belong to one primary family.
Therefore, the WFFINWGT (FWGT) for each member is equal to the person weight of the
family reference person (who is also the father). Some members of this primary family belong to
a related subfamily unit (i.e., daughter, son- in-law, and grandchild). The subfamily weight
WSFINWGT (SWGT) for each member of the subfamily is equal to the person weight of the
subfamily reference person (e.g., the daughter). WSFINWGT (SWGT) is zero for the father and
mother who are not part of the subfamily.

The second household is a male-householder nonfamily household, RHTYPE = 4 (HTYPE = 4),
with three unrelated individuals. The household reference person is the primary individual,
EFTYPE = 34 (FTYPE = 4), and the others are secondary individuals, EFTYPE = 45
(FTYPE = 5). The WHFNWGT (HWGT) for this household is the person weight of the
household reference person, and the weight is the same for all individuals. The WFFINWGT
(FWGT) is different for each individual because each one is treated as his or her own family
reference person.

The third household is a group-quarters household, RHTYPE = 6 (HTYPE = 6). Because there is
no household reference person based on the typical definition of renter or owner, both
individuals are classified as secondary individuals, EFTYPE = 45 (FTYPE = 5). The first
secondary individual in a group quarters is labeled as the household reference person, and the
WHFNWGT (HWGT) for each person in group quarters is the weight of that individual. The
WFFINWGT (FWGT) for each individual is different because each forms an individual family.


Calendar Month Estimation: Using a Single Core Wave File

Each core wave file consists of data from 7 calendar months covered by the reference month
periods for the four rotation groups. There is only 1 calendar month with complete data from all
four rotation groups. As an illustration, Table 8-6 shows the calendar months within the
reference periods for Wave 1 of the 1991 Panel and the number of rotation groups available per
month. The table shows that data from all four rotation groups are available for January 1991
only. Data are available from three rotation groups for December 1990 and February 1991, for
two rotation groups for November 1990 and March 1991, and for one rotation group for October
1990 and April 1991.


Throughout this chapter, pre-1996 variable names appear in parentheses following 1996 variable names.
                                                     8-12
                                                                                                                            Table 8-5. Family and Subfamily Reference Months Weights, by RHTYPE (HTYPE), EFTYPE (FTYPE), and
       Throughout this chapter, pre-1996 variable names appear in parentheses following 1996 variable names.


                                                                                                                                                        ESFTYPE (STYPE) in Wave 1 of the 1990 Panel

                                                                                                                                             SHH                              EENT      EPPP      WPFIN         WHFN WFFIN WSFIN EF                    ES F
                                                                                                               Household        SSUID        ADID    RFID    RFID2 RSID       AID       NUM       WGT           WGT        WGT        WGT    TYPE      TYPE
                                                                                                               Member           (SUID)       (ADDID) (FID)   (FID2) (SID)     (ENTRY) (PNUM) (FNLWGT) (HWGT) (FWGT) (SWGT) (FTYPE) (STYPE)
                                                                                                                                                          RHTYPE = 1 (HTYPE = 1)—Married-couple family household
                                                                                                               Father a,b     101111103 11         1         1       0        11        101       5,000         5,000      5,000      0      1         0
                                                                                                               Mother         101111103 11         1         1       0        11        102       5,000         5,000      5,000      0      1         0
                                                                                                               Daughterc      101111103 11         1         0       1        11        103       4,800         5,000      5,000      4,800  1         1
                                                                                                               Son-in-law     101111103 11         1         0       1        11        104       4,800         5,000      5,000      4,800  1         1
                                                                                                               Grandchild     101111103 11         1         0       1        11        105       3,000         5,000      5,000      4,800  1         1
                                                                                                                                                            RHTYPE = 4 (HTYPE) = 4—Male-householder nonfamily
                                                                                                               Male 1 a,b     122210000 11         1         1       0        11        101       6,000         6,000      6,000      0      4         0
                                                                                                               Person 2b      122210000 11         1         1       0        11        102       4,500         6,000      4,500      0      5         0
                                                                                                               Person 3       122210000 11         1         1       0        11        103       5,500         6,000      5,500      0      5         0
                                                                                                                                                                  RHTYPE = 6 (HTYPE = 6)—Group quarters
                                                                                                               Individual 1a 222210000 11


                                                                                                                                                                                                                                                                    USING SAMPLING WEIGHTS ON SIPP FILES
                                                                                                                                                   1         1       0        11        101       4,500         4,500      4,500      0      5         0
8-13


                                                                                                               Individual 2 222210000 11           1         1       0        11        102       3,500         4,500      3,500      0      5         0
                                                                                                               Notes: Month = 01; Year = 1990. RHTYPE (HTYPE)—type of household: 1 = married couple family household, 2 = male householder family household,
                                                                                                               3 = female householder family household, 4 = male householder nonfamily household, 5 = female householder nonfamily household, 6 = group quarters;
                                                                                                               EFTYPE (FTYPE)—type of family: 1= primary family, 3 = unrelated subfamily, 4 = primary individual, 5 = secondary individual.
                                                                                                               a
                                                                                                                 Household reference person—see text.
                                                                                                               b
                                                                                                                 Family reference person.
                                                                                                               c
                                                                                                                   Related subfamily reference person.
SIPP USERS’ GUIDE

            Table 8-6. Calendar Month Estimation: Using a Single Core Wave File
                            in Wave 1 of the 1991 and 1996 Panels

                                                Reference Months—Wave 1, 1991 Panel
Rotation    Interview     1990        1990        1990       1991     1991         1991                 1991
Group       Month         Oct.        Nov.        Dec.       Jan.     Feb.         Mar.                 Apr.
2           Feb. 1991     1           2           3          4
3           Mar. 1991                 1           2          3        4
4           Apr. 1991                             1          2        3            4
1           May 1991                                         1        2            3                    4
Rotation Group
Adjustment                4           2           4/3        1        4/3          2                    4
                                                Reference Months—Wave 1, 1996 Panel
Rotation    Interview     1995        1996        1996       1996     1996         1996                 1996
Group       Month         Dec.        Jan.        Feb.       Mar.     Apr.         May                  June
1           Apr. 1996     1           2           3          4
2           May 1996                  1           2          3        4
3           June 1996                             1          2        3            4
4           July 1996                                        1        2            3                    4
Rotation Group
Adjustment                4           2            4/3          1           4/3          2              4


The reference month and interview month weights for each r tation group are designed to
                                                                   o
represent a quarter of the population at the month of reference or interview, respectively. The
weights for each rotation group can be inflated to represent the full population. For every month,
the inflation adjustment equals four divided by the number of rotation groups available. For
example, the adjustment for October 1990 is 4/1 because there is only one rotation group in this
month. For January 1991, the adjustment factor is 1 because all four rotation groups are available
for this month.

Users are strongly encouraged to use the full sample of all four rotation groups whenever
possible. The core wave files are designed to support analysis using the full sample of all four
rotation groups (discussed below). While the weights can be modified to compensate for a
smaller sample, estimates based on a subset of rotation groups will be less reliable than those
based on the full sample.


Calendar Month and Quarterly Estimation:
Using Two or More Core Wave Files

Combining data from two or more core wave files can increase the data available for making
estimates for calendar months or continuations of calendar months such as quarters of the year.
As an example, Table 8-7 shows the effects of cumulating calendar month data across two
waves: Waves 1 and 2 of the 1991 Panel. By combining Waves 1 and 2, there are now four
rotation groups for calendar month estimations from January through April 1991. To calculate


Throughout this chapter, pre-1996 variable names appear in parentheses following 1996 variable names.
                                                     8-14
                                                     USING SAMPLING WEIGHTS ON SIPP FILES

           Table 8-7. Calendar Month Estimation: Using Two Core Wave Files from
                          Waves 1 and 2 of the 1991 and 1996 Panels

                                                            Reference Months
Rotation     Interview         1990        1990      1990         1991       1991            1991           1991
 Group        Month            Oct.        Nov.       Dec.         Jan.      Feb.            Mar.           Apr.
                                               Wave 1, 1991 Panel
2            February      1           2           3            4
3            March                     1           2            3         4
4            April                                 1            2         3              4
1            May                                                1         2              3              4
                                               Wave 2, 1991 Panela
2           June                                                          1              2              3
3           July                                                                         1              2
4           August                                                                                      1
1           September
Rotation Group
Adjustment                 4           2             4/3        1         1              1              1
                                                            Reference Months
Rotation     Interview         1995        1996        1996       1996       1996            1996           1996
 Group        Month            Dec.        Jan.        Feb.       Mar.       Apr.            May            June
                                               Wave 1, 1996 Panel
1            Apr. 1996     1           2           3           4
2            May                       1           2           3            4
3            June                                  1           2            3            4
4            July                                              1            2            3              4
                                               Wave 2, 1996 Panela
1             August                                                        1            2              3
2             September                                                                  1              2
3             October                                                                                   1
3             November
Rotation Group
Adjustment                 4           2             4/3        1           1            1              1
a
  Not all data from Wave 2 are shown in the table.


calendar month estimates for each of those months, the user can simply select the person- month
records for the month of interest from a file that pools records from Waves 1 and 2 and apply the
WPFINWGT (FNLWGT) associated with each record to obtain the full sample estimate.

Quarterly estimates in the form of average month estimates also can be computed based on a
combined file. For example, to calculate the percentage of people receiving food stamps in the
first quarter of 1991, users can obtain the weighted number of people receiving food stamps and
the weighted number of the total population in each month of the quarter. Then the percentage of
people receiving food stamps is the sum across months of the weighted number of people
receiving food stamps divided by the sum of the weighted number of total population in the
quarter. In deriving quarterly estimates, or estimates for any time interval, from data in the core
wave files, users need to include all four rotation groups in each month of the estimation.


Throughout this chapter, pre-1996 variable names appear in parentheses following 1996 variable names.
                                                       8-15
SIPP USERS’ GUIDE

The quarterly estimates derived by this method are cross-sectional estimates, based on the
samples in each month of the quarter. When working with panels prior to 1996, users interested
in extracting longitudinal characteristics (e.g., the percentage of people receiving food stamps for
all 3 months, or in any of the 3 months, of the quarter) are encouraged to use the full panel file.
Prior to the 1996 Panel, the editing and imputation procedures used for the core wave files could
introduce artificially high rates of month-to-month transitions. With the introduction of CAI in
the 1996 Panel, the use of core wave files for that kind of estimation problem is expected to be
much less problematic because CAI should provide more complete and accurate data.


Using Weights in the Topical Module Files

The topical module files contain one weight variable—WPFINWGT (FINALWGT). For the
1996 Panel, this weight is the person cross-sectional weight for the fourth reference month. Prior
to 1996, this weight was the person interview month weight for people who provided data for a
topical module. It shows the number of people in the population represented by the sample
person in the interview month.

The sample weights on the topical module files are defined in the same manner as the sample
weights on the core wave files. The WPFINWGT (FINALWGT) for each rotation group is
defined to represent a quarter of the population at the interview month. When all four rotation
groups are used, the interview month weight for the full sample represents the population
estimate averaged over the 4 months of interviewing per wave.


Using Weights in the Full Panel File
The weight variables in the full panel file are the calendar year weights, WPFINWGT
(FNLWGT), and the full panel weight (PNLWGT). The number of calendar year weights on the
file depends on the duration of the panel. Most panels before the 1996 Panel have two calendar
year weights. The exceptions are the 1989 Panel, which has one calendar year weight—
WPFINWGT89 (FNLWGT89)—and the 1992 Panel, which has three calendar year weights—
WPFINWGT92 (FNLWGT92), WPFINWGT93 (FNLWGT93), and WPFINWGT94
(FNLWGT94). When the 1996 full panel file is complete, it will have four calendar year
weights.

The weight variables are defined for sample persons who are in the sample for different periods
of time. The calendar year weights apply to sample persons who had interviews covering the
control date of the corresponding calendar year and who have complete data (either reported or
imputed) for every month of the year (excluding months of ineligibility). The panel weight
applies to sample persons who are in the sample in Wave 1 of the panel and who have complete
data (either reported or imputed) for every month of a panel (excluding months of ineligibility).
People are assigned calendar year weights equal to zero when they do not have interviews
covering the control date, have missing data for one or more months of the year, or both.


Throughout this chapter, pre-1996 variable names appear in parentheses following 1996 variable names.
                                                     8-16
                                                   USING SAMPLING WEIGHTS ON SIPP FILES

Similarly, people are assigned panel weights equal to zero if they were not in sample in Wave 1,
have missing data for one or more months of the panel, or both.

The population of inference for each of these weights is the population of survivors of the
January (or Wave 1, depending on the weight) population. Infants born after the beginning of the
panel are assigned a PNLWGT of zero. Similarly, infants born after the control date are assigned
a calendar year weight of zero for that year. This weighting can have important implications for
those studying young children when infants are a sizable fraction of the population. For example,
the WIC program serves children under 5 years of age. Infants in their first year constitute 20
percent of that population.

The SIPP full panel file contains records for every person who was ever part of a responding
SIPP household. There is one record for each such person, excluding people who may have been
in the sample for only 1 month. The first number in PP-EENTAID (PP-ENTRY) and in PP-
EPPPNUM (PP-NUM) indicate the wave in which the person entered the sample. Each record
contains month-by- month data collected at every wave. However, records with incomplete data
for a given period (year or full period of the panel) are assigned weights of zero. As discussed in
Chapter 4, beginning with the 1991 Panel, a new imputation procedure was put into place to
allow more people to have positive weights in the full panel files. All people with one or more
missing waves, each of which was bounded on both sides by interviewed waves, have their data
imputed for the bounded missing waves. With this procedure, a significant portion of the panel
nonrespondent records became usable records for longitudinal analysis. Beginning with the 1996
Panel, people with two consecutive missing waves can have their data imputed for those waves if
they are bounded by interviewed waves.

The variables PPID (PP-ID), PP-EENTAID (PP-ENTRY), and PP-EPPPNUM (PP-PNUM)
identify people in the full panel files (Chapter 12). Table 8-8 provides examples of the weights in
the 1990 full panel file. The 1990 Panel provides three weights: WPFINWGT (FNLWGT90),
WPFINWGT91 (FNLWGT91), and PNLWGT. The person on the first row is a complete panel
member, with all three weights greater than zero. The second person has positive calendar year
weights but zero PNLWGT, which probably indicates that this person provided data for the first
2 calendar years but left before Wave 8. The third person had complete (reported or imputed)
data for the first calendar year, but probably left before the end of the second calendar year. The
fourth person entered the panel at Wave 4 and probably remained in sample until the end of the
panel. He was eligible for only a calendar year 2 weight. The last person entered at Wave 7 and
was assigned a weight of zero for all three weights on the panel file (however, this person would
have had reference month and interview month weights on the Wave 7 and 8 core files).

                     Table 8-8. Calendar Year and Panel Weights, 1990-1993

                   PP-EENTAID         EPPPNUM            WPFINWGT90         WPFINWGT91
PP-ID              (PP-ENTRY)         (PP-PNUM)          (FNLWGT90)         (FNLWGT91)          PNLWGT
123456789          11                 101                5,500              6,000               6,500
123456789          11                 102                5,500              6,000               0
123456789          11                 101                7,200              0                   0
221456789          41                 401                0                  6,500               0
567891211          71                 701                0                  0                   0


Throughout this chapter, pre-1996 variable names appear in parentheses following 1996 variable names.
                                                     8-17
SIPP USERS’ GUIDE


Calendar Year Estimation: Using the Full Panel File

Although the SIPP collects most core content with monthly resolution, users may need to
construct calendar year estimates of quantities such as total annual income. One way to construct
such estimates is to work with the full panel files, extracting those records with positive calendar
year weights. For example, to estimate average annual wages in 1991 for people over age 25 on
January 1, 1991, one could identify records from the 1990 Panel with positive values on the
calendar year weight FNLWGT91. The annual income amount for each sample person is the sum
of the amounts received during each month of the calendar year. The aggregate income estimates
for the population can be derived by multiplying each person’s annual income by FNLWGT91
and summing the products across all people. An estimate of average income is this weighted total
income divided by the sum of the weights (summed across the same subsample of the
population). 6

Annual estimates computed with this method are based on monthly data from the same person
collected at three or four points in time (depending on the rotation group of the respondent). The
shorter recall period used by SIPP is generally believed to provide estimates of annual measures
with less nonsampling error than other surveys that collect annual income measured only once
during a year. Chapter 6 and the SIPP Quality Profile, 3rd Ed. (U.S. Census Bureau, 1998a),
provide a more detailed discussion of nonsampling error in SIPP.


Spell Estimation: Using the Full Panel File

Analysis of SIPP data that takes full advantage of the longitudinal nature of the survey can take a
number of forms. In studies of the dynamics of household composition, labor force activity, and
welfare recipiency, analysts have applied a set of methods that fall under the general headings of
survival analysis (see Kalbfleisch and Prentice, 1980) and event-history analysis (see Tuma and
Hannan, 1984). Among many other topics, researchers have studied the length of time that a
woman remains single, a person remains unemployed, or a person receives food stamps before
marrying, getting a job, or moving off the Food Stamp program. A spell of being single,
unemployed, or receiving food stamps is a period of time during which a person’s status did not
change, and it is the duration of those spells that is often of interest.

In these studies, the unit of analysis is the spell. A file of spells is built from the person records in
the full panel file, scanning across months to find a transition into and out of the state of interest.
An example of the approach is provided by Shea (1995b). She constructed spells from the
records of people with positive full panel weights (PNLWGT greater than zero), restricting her

6
  For purposes of exposition, this discussion has neglected the complication that not all persons with positive
calendar year weights will have 12 months of data. For example, any person who was in the population January 1
but who spent at least 1 month during that year in an institution would have fewer than 12 months of data. If that
person had complete data for the months when he or she was not in the institution, the person would have a positive
value for FNLWGT91. This issue is particularly pertinent for studies of the elderly, since a noneligible portion of
that population spend some time in a nursing home or some other type of extended care facility.


Throughout this chapter, pre-1996 variable names appear in parentheses following 1996 variable names.
                                                      8-18
                                                   USING SAMPLING WEIGHTS ON SIPP FILES

analysis to spells starting after the beginning of the panel, as is commonly done. Methods have
been proposed that allow for the use of spells in progress at the start of the panel when the
beginning dates of those spells are known (see Guo, 1993).

An alternative approach is to use all people in the full panel file. Spells can be constructed
whenever a transition into the state of interest is observed (e.g., the birth of a child to a single
woman). There are three possible outcomes that might be of interest: (1) a transition out of
“single parenthood” is observed when the woman marries; (2) the spell is right-censored because
the woman is lost through attrition from the sample before the end of the panel and before she
marries; and (3) the spell is right-censored because the panel ends before she marries. If modeled
in that way, the appropriate weight would be the woman’s calendar month weight associated
with the month that the spell of single parenthood began. Calendar month weights are not on the
full panel file, but can be merged into that file from the appropriate core wave files.

During the course of a SIPP panel, some panel members can experience multiple spells (e.g., of
participation in a given program). There are two approaches to handling this situation: (1) select
only the first spells that started during the life of the panel (Ruggles and Williams, 1989), or (2)
use all spells starting during the life of the panel (Kalton et al., 1992).

The length of spells that can be fully observed depends on the duration of a panel. SIPP panels
before 1991 were designed to last 32 months. However, several panels were shorter because of
budget constraints. The 1992 Panel lasted 36 months. The 1996 Panel has 48 months of data.

A note for users of spell analysis is that, in SIPP, as in other panel surveys, people tend to report
a change in recipiency more often between waves than within waves (the seam effect). This
suggests that it may not be possible to pinpoint changes to a specific month. More detailed
discussions of the seam effect are provided in Chapter 6 and in the SIPP Quality Profile, 3rd Ed.
(U.S. Census Bureau, 1998a).


Pooling Data from Two or Three Panels

Prior to the 1996 Panel, the SIPP design employed overlapping panels so that two or three panels
could be in progress at a given time. Thus, users can pool data from two or three panels in order
to produce larger samples, and hence more precise estimates, for a given time. Table 8-9
illustrates the wave overlap for the 1984 through 1993 Panels. One can see that Wave 7 of the
1984 Panel and Wave 3 of the 1985 Panel both cover the same period. Some overlapping waves
do not cover exactly the same period. For example, Wave 6 of the 1984 Panel covers one more
month than does Wave 2 of the 1985 Panel, a short wave.

Users are not encouraged to pool data from Wave 1 with data from any other wave. Differences
in interviewing procedures, question wording, and interviewer experience between Wave 1 and
other waves call into question the comparability of Wave 1 responses relative to responses at
other waves. In general, when pooling data from multiple panels, users should be sensitive to the


Throughout this chapter, pre-1996 variable names appear in parentheses following 1996 variable names.
                                                     8-19
SIPP USERS’ GUIDE

potential impact of differences in questionnaire items, time- in-sample effects, and other
nonsampling errors.

Analysts can obtain combined panel estimates using one of two methods:

•     Combine data from two or more panels and then produce estimates.

•     Combine estimates derived separately from each panel.

When combining data from successive panels, users need to adjust the weights; otherwise, the
weights may sum to twice the U.S. population total. One simple procedure is to reduce the
weights in each sample in proportion to the number of interviews.

To combine data from two successive panels, i and i+1, multiply the weights in panel i by the
factor
                                      Ii
                            Wi =
                                  I i + I i=1            (8-1)

where I = interviews. Likewise, multiply the weights in panel i+1 by

                                    Wi+1 = (1 − Wi )                  (8-2)

If either panel contributes data from less than four rotations, the analyst must multiply the
weights in the short panel by a factor equal to four divided by the number of rotations
contributing data.

Use formulas 8-1 and 8-2 for any two overlapping panels, including the scenario in which three
panels overlap but the interest is in only two panels. For three overlapping pane ls, Wi, Wi+1 , and
Wi+2 can be computed in much the same way:
                                               Ii
                            Wi =
                                   ( I i + I i +1 + I i+ 2 )      (8-3)

                                                 I i +1
                               Wi+1 =
                                        ( I i + I i+1 + I i+ 2 )           (8-4)
and
                               Wi+2 = 1 – Wi – Wi+1                        (8-5)


Use weighting factors also to combine separate estimates from overlapping panels,


                                X = Wi X i + Wi +1 X i+1                   (8-6)


Throughout this chapter, pre-1996 variable names appear in parentheses following 1996 variable names.
                                                           8-20
                                                   USING SAMPLING WEIGHTS ON SIPP FILES


where X = joint estimate (total, mean, proportion, etc.), Xi = estimate from earlier panel, and
Xi+1 = estimate from later panel.

For example, there were 15,061 interviews in Wave 6 of the 1984 Panel and 9,928 interviews in
Wave 2 of the 1985 Panel. Thus, the weighting factor for records in Wave 6 of the 1984 Panel is

                                                 Wi = 0.6027

and the weighting factor for Wave 2 of the 1985 Panel is


                                                Wi+1 = 0.3973

Wave 6 of the 1984 Panel contributes 4 rotations to the pooled data, so the weight adjustment for
records in Wave 6 is Wi. Wave 2 of the 1985 Panel, however, contributes only three rotations to
the pooled data. Thus, the weight adjustment for records in Wave 2 is

                                               4
                                                 Wi +1 = 0.5297
                                               3

Analysts interested in monthly estimates can pool data from multiple waves in each panel to
avoid missing rotations.

We computed the weighting factors in Table 8-9 using the formulas given in (8-1), (8-3), and
(8-4). These weighting factors are most appropriate for combining topical module data from
successive panels. Weighting factors for combined panel monthly and quarterly estimates may
differ, particularly when short waves are involved.


Throughout this chapter, pre-1996 variable names appear in parentheses following 1996 variable names.
                                                     8-21
SIPP USERS’ GUIDE

Table 8-9. Weighting Parameter Adjustment Factors for Both the
Two -Panel and Three-Panel Combinations *

Panel                                                                        Weighting factors   Weighting factors
                                                                             for combining       for combining
                                                                             waves from two      waves from three
                                                                             panels.             panels.
                                                                             Wi                  Wi , Wi+1
1984    1985   1986   1987   1988    1989   1990    1991       1992   1993

1
2a
3
4
5b      1
6b      2a                                                                   0.60c
7       3                                                                    0.53
8ab     4b     1                                                             0.49c
9b      5b     2                                                             0.58, 0.49          0.41, 0.29
        6b     3a                                                            0.56
        7      4b     1                                                      0.50
        8      5b     2                                                      0.50, 0.49          0.33, 0.33
                b
               6      3                                                      0.49
               7b     4      1                                               0.49
                      5      2                                               0.49
                      6      3                                               0.49
                      7      4       1                                       0.49
                             5       2                                       0.49
                             6       3                                       0.49
                                            1
                                            2
                                            3
                                            4       1
                                            5       2                        0.60
                                            6       3                        0.60
                                            7       4          1             0.60
                                            8       5          2             0.60, 0.42          0.39, 0.25
                                                    6          3             0.41
                                                    7          4      1      0.42
                                                    8          5      2      0.42, 0.49          0.26, 0.36
                                                               6      3      0.49
                                                               7      4      0.49
                                                               8      5      0.49
                                                               9      6      0.49
                                                               10ab   7      0.43c
                                                                      8


Throughout this chapter, pre-1996 variable names appear in parentheses following 1996 variable names.
                                                        8-22
                                                   USING SAMPLING WEIGHTS ON SIPP FILES

                                                                  9
a
        Short wave. Approximately 3/4 of sample households interviewed over 3 months..
b
        Wave does not cover exactly same period as wave from later panel.
c
        Weighting factor involves short wave.
*
        Weighting factors for combining Wave 1 with other waves are not provided.


Throughout this chapter, pre-1996 variable names appear in parentheses following 1996 variable names.
                                                     8-23
Section II
9. The SIPP Public Use Files
Section I of the Users’ Guide is written primarily for researchers who need information to guide
their use of data from the Survey of Income and Program Participation (SIPP). It describes the
design and content of SIPP and the processing of SIPP data by the Census Bureau. It also
discusses weighting, sampling error, and nonsampling error.

Section II addresses the mechanics of using the SIPP public use files. The chapters in this section
are written for the analyst needing guidance on how to accomplish a variety of common tasks.
This section contains minimal discussion of underlying concepts (such as the relationship
between waves, rotation groups, and reference months), which are examined in Section I.

There are five chapters in Section II: this chapter provides a general introduction to the public
use files; one chapter is devoted to each of the three types of SIPP data files, and a final chapter
discusses merging multiple SIPP data files. After reading the current chapter, the user working
with just one type of SIPP data file may wish to turn to the chapter on that type of file. For the
1996 Panel, most variable names changed from those of previous panels. To aid users working
with files from panels prior to 1996, the chapters in Section II present both the pre- and post-
1996 Panel variable names when the text applies to both 1996 and pre-1996 panel files (when the
1996 Panel names are available). In the main body of the text, the pre-1996 Panel names are
presented in parentheses following those from the 1996 Panel. For example, the sample unit ID
variable name in the core wave files, which is “SSUID” in the 1996 Panel, was SUID in previous
panels. The variable name is written in this chapter as SSUID (SUID). In tables, a variety of
methods are used to present both sets of names.

The balance of this chapter provides an overview of the chapters that follow. Those chapters
offer more detailed discussions, complete with specific examples and samples of programming
code. This introduction highlights points that are common to all SIPP data files. It also highlights
important differences.


Types of SIPP Data Files
There are three types of public use files containing SIPP data: core wave files, topical module
files, and full panel longitudinal research files (referred to as either longitudinal files or full panel
files):

!   Core wave files are currently issued in person-month format. These files contain up to four
    records for each primary sample member and each person who lived with a primary sample


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                      9-1
SIPP USERS’ GUIDE

    member at any time during the 4-month reference period covered by the wave. Each of the
    records contains data from one of the four reference months covered by the wave.1
!   Topical module files for the 1996 Panel contain one record for each person who was a
    sample responding (or Type Z nonresponding) member of a SIPP household during the
    fourth month of the reference period for the wave. Topical module files from earlier panels
    contain one record for each primary sample member and each person who lived with a
    primary sample member at the time of the interview for the wave in which the topical module
    was administered.
!   Full panel longitudinal research files contain one record for each primary sample member
    and for each person who ever lived with a primary sample member at any time during the
    SIPP panel—a period of up to 4 years.


Understanding the ID Variables in SIPP
Because different files contain different information, the capacity to identify people across those
files is important. SIPP is a longitudinal survey designed to allow researchers to track people
over time; other critical functions include identifying individuals over time and identifying when
a person is present in the sample. Finally, because the relationships among people change over
time, identification of those relationships at any specific time is important. The key to these tasks
lies in understanding how SIPP ID variables are used to identify persons, families, and
households.2

The most basic ID variables in SIPP have different variable names in the different types of public
use files issued by the Census Bureau. Table 9-1 displays those variables and shows the names
they are given in the different files.


Sample Unit IDs

When initial Wave 1 interviews are conducted, each physical dwelling unit is assigned a unique
(random) sample unit ID.3 The sample unit ID assigned to a person never changes: in all


1
  Prior to the 1990 Panel, core wave files were issued with a single record for each person. Each record contained
data for all four of the reference months covered by the wave. The structure of the file was similar to the
longitudinal files issued by the Census Bureau. Earlier editions of this Users’ Guide provide details.
2
  Other variables are used to identify people who are members of related subfamilies, unrelated subfamilies (also
known as secondary families), and transfer program units such as food stamp units.
3
  The sample unit ID is a random recode of three other variables in the Census Bureau internal files: the
respondent’s sampling area, the cluster of housing units within that area (called a segment), and a sequentially
assigned serial number. Because the variables in the Census Bureau’s internal files contain detailed information
about the location of the dwelling unit, those variables are suppressed in the public use files to protect the
confidentiality of survey respondents.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                      9-2
                                                                            THE SIPP PUBLIC USE FILES

                            Table 9-1. SIPP Variable Names, by File Type

File Type                  Sample Unit ID      Current Address ID Entry Address ID                 Person Number
                                         Panels Prior to the 1996 Panel
Core Wave Person-          SUID                ADDID                  ENTRY                        PNUM
Month Files
Topical Module Files       ID                     ADDID                    ENTRY                   PNUM
Full Panel (and Partial-   PP-ID                  HH-ADDID                 PP-ENTRY                PP-PNUM
Panel) Longitudinal
Research Files
                                                    1996 Panel
Core Wave Person-          SSUID                  SHHADID                    EENTAID               EPPPNUM
Month Files                                                                  (No longer needed
                                                                             to identify persons)
Topical Module Files       SSUID                   SHHADID                   EENTAID               EPPPNUM
                                                                             (No longer needed
                                                                             to identify persons)
Full Panel (and Partial-   File not yet available. Current plans call for using the same ID variable names in all files
Panel) Longitudinal        from the 1996 Panel.
Research Files


subsequent interviews, the Wave 1 primary sample persons carry their sample unit IDs with
them. This means that if they move to different addresses, they keep the same sample unit IDs. If
new people join those original sample members at their original addresses, they become
secondary sample members by virtue of their association with the primary sample person with
whom they are living. Secondary sample persons are all assigned the sample unit ID of the
primary sample member with whom they are living. At the conclusion of the panel, all people
who have ever lived with a member of a given original sample unit share the same sample unit
ID. That sample unit ID is their common link to the original sample unit.


Current Address IDs

The current address ID identifies each housing unit occupied by one or more original sample
members in any given month.4 Current address IDs are assigned within sample units (they are
unique only when combined with the sample unit ID variable), and they have two parts. The first
part (one digit for all but the 1992 and 1996 Panels, two digits for the 1992 and 1996 Panels)
identifies the wave in which one or more original sample members were first scheduled to be
interviewed at the address. The second part of the ID is one digit, and it is used to sequentially
number addresses for households that split into two or more households as a result of a move to a
different location by original sample persons. All Wave 1 households have a current address ID
of 11. Any new addresses that are occupied in Wave 2 are numbered 21, 22, and so on; new
addresses occupied during the Wave 3 reference period are numbered 31, 32, 33, and so on. The


4
  A house, an apartment or other group of rooms, or a single room is regarded as a housing unit if it is occupied or
intended for occupancy as separate living quarters.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                        9-3
SIPP USERS’ GUIDE

current address ID is a monthly variable, the value of which changes in the month in which an
individual moves to a new address.


Entry Address IDs

The entry address ID is the current address ID that a sample member occupied when he or she
first entered the SIPP sample. It is used in conjunction with the person number to uniquely
identify persons within the sample unit and does not change even if the person moves.


Person Numbers

All primary and secondary sample members are assigned a person number when they first enter
the SIPP panel. Those numbers are assigned sequentially, within each wave and within each
household (current address). The first part of the person number (two digits for the 1992 and
1996 Panels, one digit for all others) indicates the wave in which the person originally entered
the sample. Thus, primary sample persons have person numbers in the 100 series, beginning with
101; secondary sample members have person numbers beginning with 201 if they enter the
sample in Wave 2, 301 if they enter the sample in Wave 3, 401 if they enter the sample in Wave
4, and so on.


Identifying Persons and Their Relationships
Each person in SIPP can be uniquely identified by the combination of a sample unit ID, an entry
address ID,5 and a person number. These ID variables are useful when linking the records for a
single person across multiple SIPP data files. They also contain substantive information that may
be useful in some situations.


Using the Monthly Interview Status Variable

The monthly interview status variable helps determine whether the data for a person in a given
month should be used. This variable is labeled PP-MIS in the pre-1996 longitudinal files, in the
(older) person-record-format core wave files, and in older topical module files. It is labeled


5
 For the 1996 Panel, the entry address is not necessary to uniquely identify individuals in SIPP. Its continued use
will not create any problems; it just provides additional information.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                       9-4
                                                                            THE SIPP PUBLIC USE FILES

PPMIS in newer pre-1996 topical module files.6 This variable has three possible values: 0, 1,
and 2. When using the older person-record-format core wave files, the topical module files for
panels prior to 1996, and the longitudinal files, analysts need to understand that the monthly
interview status is the only reliable guide as to whether the data for a given person should be
used in a given month. Analysts should use data for only those months in which a person’s
interview status is equal to 1. Any data present for months when a person’s interview status is
coded either 0 or 2 should be ignored. A code of 0 indicates that the person was not in the sample
for that month, and a code of 2 indicates a noninterview for that month.7

When working with other data sources, analysts often identify which cases will be used in an
analysis by examining either the weight variable or the variables used in the analysis itself. In the
first case, the rule is generally to use all cases with positive weights and ignore the rest. In the
second case, the rule is generally to use all cases with nonmissing data. Each of those rules can
lead the SIPP user astray, as illustrated below.

The presence of a zero weight is not a reliable guide to whether a person should be excluded
from the planned analysis. Although those people will not enter into any weighted tabulations,
they may provide important contextual information about people who do enter into those
(weighted) tabulations. For example, a person with a calendar year weight of zero who is a
member of the same household as a positive-weight person for only 3 months provides
information about the positive-weighted person’s household (including, for example, household
size, composition, income, and program participation) for the 3-month period that he or she was
a household member. It is for this reason that records for zero-weighted persons are retained in
the SIPP data files.8

The presence of data in analysis fields for any given month is also not a reliable guide to whether
the person should be included in the planned analyses. Data are collected for all months of the
reference period for a given wave, even if the interviewed person was in the sample for only part
of the reference period. For example, on the topical module and longitudinal files for panels prior
to 1996, 4 months’ worth of data will generally be present for a person who was a member of a
SIPP household for only the last 2 months of the wave. However, only those last 2 months of
data should be used.9

6
  The person-month-format core wave files contain records only for those months that a person has an interview
status code of 1. The monthly interview status variable is not included in those files because it is not needed. The
topical module files for the 1996 Panel contain records only for those with an interview status code of 1 in the fourth
month of the wave’s core reference period. Although the interview status variable is included on the topical module
files from the 1996 Panel, it need not be used with them.
7
  For those months when a noninterviewed person was both in scope for the survey and had data imputed (this
includes the Type Z imputations and the missing wave imputations), the variable is set to 1. In those cases, the data
can be used in the same manner as any of the other imputed data in the SIPP public use files.
8
  Other important situations also arise. For example, infants are assigned a calendar year weight of zero for the year
of their birth even though they have an interview status of 1 from their birth month forward. Also, a person who dies
during the year will have a positive calendar year weight even though, past the month of death, he or she will have
an interview status of 0 or 2. In neither case does the weight variable reflect the presence or absence of the person,
or data associated with the person.
9
  The person-month-format core wave files will have only two records for that person. The topical module files for
the 1996 Panel will have information only about month 4 of the wave’s core reference period.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                         9-5
SIPP USERS’ GUIDE


Determining Monthly Household Composition

A household, as the term is used in Census Bureau publications, consists of all people who
occupy a housing unit, regardless of their relationships to each other.10 For many purposes, a
household can be thought of as people living at a common address. A person’s current address
ID in any given month, together with his or her sample unit ID, identifies the household in which
that person is a member for that month. Members of the same household in a given month
always have an interview status of 1 and share the same sample unit ID and current address ID.
Figure 2-1 (pp. 2-10–2-14) provides an illustration of changes in houshold composition.


Determining Monthly Family Composition

The term family, as used in Census Bureau publications, refers to a group of two or more people
related by birth, marriage, or adoption who reside together; all such people are considered
members of one family. For example, if the son of the person who maintains the household and
the son’s wife are members of the household, they are treated as members of the parent’s family.
Every family must include a reference person. Two or more people living in the same household
who are related to each other but not to the household reference person form an unrelated
subfamily (also referred to as secondary families).

The labels primary individual and secondary individual as used by the Census Bureau refer to
people in households who are not related to any other household members. For many purposes,
they can be thought of as one-person families, and the Census Bureau sometimes refers to them
as pseudo-families.

Methods for identifying the interrelationships among the household members that define these
groups vary, depending on the data file being used. The topical module files do not contain any
of the information needed to directly identify the different types of families.11 When it is
necessary to identify family membership in an analysis that uses information from a topical
module, it is also necessary to merge data from the topical module file with either a core wave
file or a longitudinal file. Procedures for merging files are discussed in Chapter 13.

Identifying family membership is easiest when working with the person-month-format core wave
files. The Census Bureau has two principal methods for distinguishing families.

!    The first method defines a family as all persons who are related and living together. The
     family ID variable RFID is used with this definition. RFID groups the household reference
     person with all related household members by assigning them the same ID number. This
     family group corresponds to the Census Bureau’s definition of a primary family. RFID

10
  The one exception to this definition is people living in group quarters.
11
  The one exception is the Wave 2 topical module, which collects detailed information about all of the relationships
among all of the people who are household members at the time of the Wave 2 interview.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                       9-6
                                                                        THE SIPP PUBLIC USE FILES

     groups members of each unrelated subfamily (and primary and secondary individuals)
     separately.
!    The second method is similar to the first in defining a family, but the family excludes
     members of related subfamilies. The family ID variable RFID2 is used with this definition.
     RFID2 equals zero for members of related subfamilies. RFID2 groups members of each
     unrelated subfamily (and primary and secondary individuals) in the same way as RFID—
     each group has a unique number.
Analysts who want to analyze multigenerational families would use RFID2 (FID2) and the
variable RSID (SID). RSID (SID) treats related subfamilies as distinct family units by assigning
members of related subfamilies nonzero values. Analysts can easily distinguish unrelated
subfamilies from other family units when they use these variables and numbering schemes.

Chapter 10 discusses the use of these variables in greater detail. More work is involved when
using the longitudinal files or the (older) person-record-format core wave files. When working
with those files, analysts must create a unique family ID from several components. A number of
different strategies can be used, one of which is described in Chapter 12. Other approaches are
described in earlier editions of this Guide.


Determining Monthly Transfer Program Unit Composition

Some analyses involve summarizing data for units other than households or families. The SIPP
core data contain sufficient information to identify program units for participants in a range of
transfer programs, including Medicare; Medicaid; Aid to Families with Dependent Children
(AFDC); Temporary Assistance for Needy Families (TANF);12 General Assistance (GA);
Railroad Retirement; Social Security; Veterans Compensation and Pensions; Food Stamps; and
the Women, Infants, and Children nutrition program (WIC).

The SIPP data contain fields for each adult and child, indicating whether the individual received
benefits (either directly or by virtue of his or her relationship to another person designated as the
principal recipient) from each of these programs in each month. The SIPP data also contain
information that permits identification of program units within households. One person in each
program unit is identified as a principal recipient, and variables identifying that principal
recipient are included on the records of the people who are part of the program unit. People who
are members of a common program unit in a given month can then be identified as those who are


12
  In August 1996, the Personal Responsibility and Work Opportunity Reconciliation Act was signed into law. This
legislation replaced the old welfare system, Aid to Families with Dependent Children (AFDC), with a new program,
Temporary Assistance for Needy Families (TANF). In the 1996 Panel, the questions for income type 20 referred to
the AFDC program prior to Wave 4 and to the TANF program beginning in Wave 4. In Wave 9, the questions were
expanded somewhat to capture the larger array of program types that could exist under TANF.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                      9-7
SIPP USERS’ GUIDE

in the sample in that month (interview status = 1) with common values of:

!    The sample unit ID,
!    The current address ID, and
!    The primary recipient ID.


Constructing Household, Family, and Program Unit Level
Variables

The public use files contain selected characteristics of monthly households and families that can
be used directly in planned analyses. Data needs may require analysts to construct characteristics
of households, families, or program units that do not already exist on the public use files created
by the Census Bureau. Analysts can use the monthly ID variables described in the preceding
section to construct monthly characteristics from the public use files.


Choosing Appropriate Weight(s)

Because SIPP uses a sample design in which different households (and people) are sampled at
different rates, weights generally must be used when the user desires (approximately) unbiased
estimates of population characteristics. In general, the appropriate weight to use for an analysis
can be identified by answering two questions:

1. Which (sub)sample of SIPP is the estimate based on?
2. What population does the sample represent?
Weights for each of the calendar months covered by a panel can be found on the core wave files.
A single weight appears on the topical module files. Before 1996, the interview month was a
frequent reference period for topical module questions, and the weight on the pre-1996 topical
module files is the person interview month weight for people who provided data for a topical
module. But, as noted earlier, starting with the 1996 Panel the interview month is no longer used
as a reference month; the weight on the topical module file for the 1996 Panel is the person
cross-sectional weight for the fourth reference month. Weights for estimates that refer to a
calendar year—or, more accurately, the January population as it appears through the balance of
the calendar year—are on the longitudinal files.13

Chapter 8 provides detailed information about SIPP weights and how to use them.

13
  The calendar year weights are based on all sample members who are present in January and interviewed (or
imputed) for every month of the year that they were “in scope” for the survey. In other words, the weights include
people who died during the year if they were interviewed until they died, but they do not include people who left the
sample during the year. Because they are not members of the population on January 1, infants receive a calendar
weight of zero for the year in which they are born.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                        9-8
                                                                             THE SIPP PUBLIC USE FILES


Working with Multiple Files
There are a number of reasons that SIPP users commonly use data from more than one file:

1. The overlapping-wave/rotation-group structure of the survey creates many situations in
   which data for a single calendar reference month are contained on two different core wave
   files.
2. The overlapping-panel structure of the pre-1996 SIPP created many situations in which data
   covering a single calendar year could be found on data files from two or sometimes three
   different panels.14
3. There are many research problems in which reference to a specific calendar date is not
   crucial and a desire for increased sample size can lead to the use of data from multiple panels
   (or waves) that do not overlap.
4. Many analyses of data collected in the SIPP topical modules entail merging topical module
   data with files containing core data (the core wave files or the longitudinal research files).
5. Since the release of a longitudinal file cannot occur until after the final interview of the final
   wave of a panel, researchers requiring longitudinal data from more than one wave prior to the
   release of the longitudinal file must create their own linked data files from the available core
   wave files. As of this writing, longitudinal files are available for all but the 1996 SIPP Panel,
   so this procedure pertains primarily to users of data from the 1996 Panel.
Chapter 13 discusses each of these situations and describes procedures for using data from
multiple files to construct estimates.


The Balance of Section II
The balance of Section II is organized as follows:

!      Chapter 10 describes how to use the core wave files.
!      Chapter 11 describes how to use the topical module files.
!      Chapter 12 describes how to use the full panel longitudinal research files.
!      Chapter 13 describes how to link the different file types.
Because many users work with only a single type of file, Chapters 10, 11, and 12 are written so
that they stand alone: each chapter can be used independently, without reference to the other two
chapters. Differences across the three file types in their structure and in names for common


14
     Chapter 2 discusses the overlapping wave and panel structure of SIPP.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                         9-9
SIPP USERS’ GUIDE

variables make this a natural way to organize the material presented here. The advantage of this
organization is that an analyst working with only a single type of file will find a complete
discussion of that file type in a single chapter.

However, there is substantial overlap in the types of things that analysts will be called upon to do
with each of the file types. Thus, many ideas are repeated across the three chapters. Crucial
differences do exist among the chapters, however. Those differences are found in the variable
names used to accomplish certain common tasks and in the ways of working with data files built
around different organizational principles. While the text of a chapter may seem familiar, there
are often important differences in the details.

Table 9-2 summarizes some of the more important differences among the three file types. Table
9-2 is intended primarily for users who have already worked with at least one type of SIPP data
file. Analysts new to SIPP should skip the table and proceed to the chapter that discusses the type
of data file with which they are working. When working with a different type of SIPP file,
experienced analysts can use Table 9-2 in conjunction with the chapter that discusses that new
file type; the table will help to highlight differences that might otherwise be overlooked in the
general discussion.


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                     9-10
       following 1996 variable names.
       When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses


                                                                                                                                      Table 9-2. Differences Among Core Wave, Topical Module, and Longitudinal Files (1990–1996 Panels)

                                                                                                                                             1996 Panel                Pre-1996                    1996 Panel Topical        Pre-1996 Topical            Pre-1996 Longitudinal
                                                                                                                     Topic                   Core Wave Files           Core Wave Files             Module Files              Module Files                Files
                                                                                                                     File Structure          Person-month records      Person-month records        Person records            Person records              Person records
                                                                                                                                             Table 10-1                Table 10-1                  Table 11-1                Table 11-1                  Table 12-2
                                                                                                                     Data Dictionary         Size and begin position   Size and begin position     Size and begin            Size and begin position     1992–1993 Panels Size,
                                                                                                                                             Figure 10-1               Figure 10-1                 position Figure 11-1      Figure 11-1                 begin, field length, and
                                                                                                                                                                                                                                                         number of fields
                                                                                                                                                                                                                                                         1990–1991 Panels Size,
                                                                                                                                                                                                                                                         begin, index, and length
                                                                                                                                                                                                                                                         Figure 12-1
                                                                                                                     Importance of           Not needed on the         On the person-month         Not needed.               PP-MIS                      PP-MIS
                                                                                                                     Monthly Interview       person-month files—       files: not needed.          Topical module files      Very important              Very important
                                                                                                                     Status Variables        they contain records      Person-month files          contain records only      Table 11-2                  Table 12-2
                                                                                                                                             only for months in        contain records only for    for people for whom
                                                                                                                                             which the respondent is   months in which the         EPPMIS4 = 1.
                                                                                                                                             present and in scope.     respondent’s interview
9-11


                                                                                                                                                                       status equals 1.
                                                                                                                                                                       On the older person-
                                                                                                                                                                       record format files: very
                                                                                                                                                                       important. See earlier
                                                                                                                                                                       editions of this Users’
                                                                                                                                                                       Guide for details.
                                                                                                                     How to Identify a       SSUID, EPPPNUM            SUID, ENTRY, PNUM           SSUID, EPPPNUM            ID, ENTRY, PNUM             PP-ID, PP-ENTRY, PP-


                                                                                                                                                                                                                                                                                         THE SIPP PUBLIC USE FILES
                                                                                                                                                                                                                                                                                         THE SIPP PUBLIC USE FILES
                                                                                                                                                                                                                                                                                         THE SIPP PUBLIC USE FILES
                                                                                                                     Person                                            Table 10-3                  Table 11-6                Table 11-7                  PNUM
                                                                                                                                                                                                                                                         Table 12-6
                                                                                                                     How to Identify a       SSUID, SHHADID            SUID, ADDID                 SSUID, SHHADID            ID, ADDID                   PP-ID, HH-ADDID
                                                                                                                     Household                                         Table 10-5                  Table 11-8                Table 11-9                  Table 12-8
                                                                                                                     Identification of       Merged households         PWSUID, PWENTRY,            Merged households         PNUM is between ×80         PP-PNUM is between ×80
                                                                                                                     “Merged Households”     cannot be identified in   or PWPNUM > 0               cannot be identified in   and ×99, inclusively, and   and ×99, inclusively, and x
                                                                                                                                             files from the 1996                                   files from the 1996       x varies from 1 to 10.      varies from 1 to 10.
                                                                                                                                             Panel.                                                Panel                     Can identify the person     Can identify the person
                                                                                                                                                                                                                             only after the move;        only after the move; need
                                                                                                                                                                                                                             need to go to the core      to go to the core wave file
                                                                                                                                                                                                                             wave file to identify the   to identify the person
                                                                                                                                                                                                                             person before the move.     before the move.
                                                                                                                                                                                                                                                                     (table continues)
       following 1996 variable names.
       When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses


                                                                                                                                                                                                                                                                                       SIPP USERS’ GUIDE
                                                                                                                                                                                                                                                                                       SIPP USERS’ GUIDE
                                                                                                                                                                                                                                                                                       SIPP USERS’ GUIDE
                                                                                                                             Table 9-2. Differences Among Core Wave, Topical Module, and Longitudinal Files (1990–1996 Panels) (continued)

                                                                                                                                            1996 Panel                 Pre-1996                        1996 Panel Topical   Pre-1996 Topical            Pre-1996 Longitudinal
                                                                                                                     Topic                  Core Wave Files            Core Wave Files                 Module Files         Module Files                Files
                                                                                                                     Handling of “Merged    Not Applicable             If the move took place after    Not applicable       No matter when the          No matter when the move
                                                                                                                     Households”                                       the first reference month,                           move takes place, there     takes place, there will be
                                                                                                                                                                       there will be two records                            will be one record for      two records for each person
                                                                                                                                                                       for each person whose ID                             each person whose ID        whose ID information
                                                                                                                                                                       information changed. One                             information changed.        changed. One record reflects
                                                                                                                                                                       record reflects what                                 That record reflects what   what happened before the
                                                                                                                                                                       happened before the move                             happened after the move     move and contains the
                                                                                                                                                                       and contains the original                            and contains the new ID     original ID information. The
                                                                                                                                                                       ID information. The other                            information.                other record reflects what
                                                                                                                                                                       record reflects what                                                             happened after the move
                                                                                                                                                                       happened after the move                                                          and contains the new ID
                                                                                                                                                                       and contains the new ID                                                          information.
                                                                                                                                                                       information.
                                                                                                                                                                       If the move took place in the
                                                                                                                                                                       first reference month, there
9-12


                                                                                                                                                                       will be only one record for
                                                                                                                                                                       each person whose ID
                                                                                                                                                                       information changed. That
                                                                                                                                                                       record reflects what
                                                                                                                                                                       happened after the move
                                                                                                                                                                       and contains the new ID
                                                                                                                                                                       information.
                                                                                                                     How to Identify a      SSUID, SHHADID and         (SUID and ADDID) and            Not in the file      Not in the file             Create the family ID
                                                                                                                     Family                 RFID or RFID2 or RSID [FID or FID2 or SID or                                                                variables using PP-ID,
                                                                                                                                            or [RFID2 and RSID)]       (FID2 and SID)]                                                                  HH-ADDID, and FAMTYP
                                                                                                                                                                       Table 10-7                                                                       Table 12-10
                                                                                                                     Working with Family-   Variables for the primary Variables for the primary        Not applicable       Not applicable              Variables for the primary
                                                                                                                     Level Income           family include the related family include the related                                                       family include the related
                                                                                                                     Variables              subfamily in them.         subfamily in them.                                                               subfamily in them.
                                                                                                                                            Separate variables for     Separate variables for the                                                       No separate variables for
                                                                                                                                            the related subfamily.     related subfamily.                                                               the related subfamily.
                                                                                                                                            Table 10-9                 Table 10-10                                                                      Table 12-12
       following 1996 variable names.
       When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses


                                                                                                                            Table 9-2. Differences Among Core Wave, Topical Module, and Longitudinal Files (1990–1996 Panels) (continued)

                                                                                                                                            1996 Panel Core Wave   Pre-1996 Core Wave   1996 Panel Topical   Pre-1996 Topical   Pre-1996 Longitudinal
                                                                                                                     Topic                  Files                  Files                Module Files         Module Files       Files
                                                                                                                     Variables Describing   RHNF                   HNF
                                                                                                                     Household and          RHNFAM                 HNFAM
                                                                                                                     Family Composition     RHNSF                  HNSF
                                                                                                                                            EHREFPER               HREFPER
                                                                                                                                            EHHNUMPP               HNP
                                                                                                                                            RHTYPE                 HTYPE
                                                                                                                                            EFREFPER               FREFPER
                                                                                                                                            EFTYPE                 FTYPE
                                                                                                                                            EFKIND                 FKIND
                                                                                                                                            ESFT
9-13


                                                                                                                                            ESFRFPER
                                                                                                                                                                   FAMTYP                                                       FAMTYP
                                                                                                                                                                   FAMREL                                                       FAMREL
                                                                                                                                            ERRP                   RRP                  ERRP                 RRP                RRP
                                                                                                                                                                   RRPU
                                                                                                                                                                                                                                ENTID-PNSP


                                                                                                                                                                                                                                                           THE SIPP PUBLIC USE FILES
                                                                                                                                                                                                                                                           THE SIPP PUBLIC USE FILES
                                                                                                                                                                                                                                                           THE SIPP PUBLIC USE FILES
                                                                                                                                            EPNSPOUS               PNSP                 EPNSPOUS             PNSP               PNSP
                                                                                                                                                                                                                                ENTID-PNPT
                                                                                                                                                                   PNPT                                      PNPT               PNPT
                                                                                                                                            EPNMOM                                      EPNMOM
                                                                                                                                            EPNDAD                                      EPNDAD               Table 11-12        Table 12-11
                                                                                                                                            EPNGUARD               PNGDU                EPNGUARD
                                                                                                                                            Table 10-8             Table 10-8           Table 11-12

                                                                                                                                                                                                                                       (table continues)
following 1996 variable names
When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses


                                                                                                                                                                                                                                                                         SIPP USERS’ GUIDE
                                                                                                                                                                                                                                                                         SIPP USERS’ GUIDE
                                                                                                                                                                                                                                                                         SIPP USERS’ GUIDE
                                                                                                                                                                                                                                                                         SIPP USERS’ GUIDE
                                                                                                                           Table 9-2. Differences Among Core Wave, Topical Module, and Longitudinal Files (1990–1996 Panels) (continued)

                                                                                                                                 1996 Panel Core Wave Files               Pre-1996 Core Wave Files        1996 Panel Pre-1996        Pre-1996 Full Panel Files
                                                                                                                                                                                                           Topical Topical
                                                                                                                                         Authorized   Person-Level               Authorized   Person-Level Module Module                  Authorized Person-Level
                                                                                                              Topic         Coverage     Recipient    Amount       Coverage      Recipient    Amount         Files     Files  Coverage    Recipient      Amount
                                                                                                              Identifying
                                                                                                              Program Units
                                                                                                              Social
                                                                                                              Security     RCUTYP01 RCUOWN01 T01AMTA              SOCSEC         SSPNUM       S01AMTA                         SOC-SEC    SS-PIDX        Sources are
                                                                                                                                             T01AMTK                                          S01AMTK                                                   identified in
                                                                                                                                                                                                                                                        G1SRC1 –
                                                                                                              Railroad         NA                     T02AMT      RAILRD         RRPNUM       S02AMTA     Not in    Not in    RAILROAD RR-PIDX          G1SRC10.
                                                                                                                                                                                              S02AMTK     topical   topical
                                                                                                                                                                                              S03AMT      module    module                              Amounts are
                                                                                                              Fed SSI      RCUTYP03 RCUOWN03 T03AMTA              SSICOVRG                                files     files                               located in the
                                                                                                                                             T03AMTK                                                                                                    monthly
                                                                                                              Veteran’s                                                                                                                                 arrays
                                                                                                              Admin.       RCUTYP08 RCUOWN08 T08AMT               VETS           VETNUM       S08AMT                          VETS       VA-PIDX        G1AMT1 –
                                                                                                                                                                                                                                                        G1AMT10
                                                                                                              AFDC/TANF RCUTYP20 RCUOWN20 T20AMT                  AFDC           AFDCPNUM S20AMT                              AFDC       AFDCPIDX
                                                                                                              General
                                                     9-14


                                                                                                              Assistance   RCUTYP21 RCUOWN21 T21AMT               GENASST        GAPNUM       S21AMT                          GEN-ASST   GA-PIDX
                                                                                                              Foster
                                                                                                              Child Care   RCUTYP23 RCUOWN23 T23AMT               FOSTKID        FKPNUM       S23AMT                          FOST-KID   FOSTPIDX
                                                                                                              Other
                                                                                                              Welfare      RCUTYP24 RCUOWN24 T24AMT               OTHWELF        OWPNUM       S24AMT                          OTH-WELF   OTH-PIDX
                                                                                                              WIC          RCUTYP25 RCUOWN25 T25AMT               WICCOV         WICPNUM      WICVAL                          WICCOV     WIC-PIDX
                                                                                                              Food Stamps RCUTYP27 RCUOWN27 T27AMT                FOODSTMP FSPNUM             S27AMT                          FOODSTMP FS-PIDX
                                                                                                              Medicare                   ECRMTH                   CARECOV        MCDPNUM                                      CARECOV
                                                                                                              Medicaid     RCUTYP57 RCUOWN57                      CAIDCOV                                                     CAIDCOV
                                                                                                              CHAMPUS                                             CHAMP          CHPNUM                                       CHAMP
                                                                                                              or
                                                                                                              CHAMPVA                    RCHAPPM
                                                                                                              Health
                                                                                                              Insurance    RCUTYP58 RCUOWN58                      HIIND          HIPNUM
                                                                                                                           Table 10-16                            Tables 10-17                                                           Tables 12-19
                                                                                                                                                                  and 10-18                                                              and 12-20
                                                                                                                              Table 9-2. Differences Among Core Wave, Topical Module, and Longitudinal Files (1990–1996 Panels) (continued)
       following 1996 variable names.
       When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses


                                                                                                                                               1996 Panel Core             Pre-1996 Core Wave          1996 Panel Topical           Pre-1996 Topical            Pre-1996 Longitudinal
                                                                                                                     Topic                     Wave Files                  Files                       Module Files                 Module Files                Files
                                                                                                                     Imputed Data:

                                                                                                                     The whole record is       If no prior wave data and If MIS5 = 2 and MISj = 1 If EPPMISA = 2 or                 If PP-MIS5 = 2 and          If WAVFLG > 0 or
                                                                                                                     imputed                   EPPINTVW = 3, 4           for j = 1, 2, 3, 4 or    EPPINTVW = 3, 4                   PP-MISj = 1                 INTVW = 3, 4
                                                                                                                                                                         INTVW = 3, 4                                               for j = 1, 2, 3, 4 or
                                                                                                                                                                                                                                    INTVW = 3, 4
                                                                                                                     The corresponding wave If the corresponding           If the corresponding        If the corresponding         If the corresponding       If the corresponding
                                                                                                                     of information is imputed imputation flag indicates   imputation flag indicates   imputation flag and          imputation flag and        imputation flag indicates
                                                                                                                                               imputation.                 imputation.                 calculation flags indicate   calculation flags indicate imputation.
                                                                                                                                                                                                       imputation.                  imputation.

                                                                                                                     The variable’s value is   Almost all person-level     Almost all person-level     Most person-level            Most person-level           Limited set of imputation
                                                                                                                     imputed                   variables have imputation   variables have imputation   variables have imputation    variables have imputation   flags. There are no
                                                                                                                                               flags. There are no         flags. There are no         flags. There are no          flags. There are no         imputation flags on
                                                                                                                                               imputation flags on         imputation flags on         imputation flags on          imputation flags on         household and family
                                                                                                                                               household and family        household and family        household and family         household and family        aggregates. Use the
9-15


                                                                                                                                               aggregates. Use the         aggregates. Use the         aggregates. Use the          aggregates. Use the         person-level imputation
                                                                                                                                               person-level imputation     person-level imputation     person-level imputation      person-level imputation     flags of household and
                                                                                                                                               flags of household and      flags of household and      flags of household and       flags of household and      family members to
                                                                                                                                               family members to           family members to           family members to            family members to           identify aggregate
                                                                                                                                               identify aggregate          identify aggregate          identify aggregate           identify aggregate          amounts that include
                                                                                                                                               amounts that include        amounts that include        amounts that include         amounts that include        imputed values.
                                                                                                                                               imputed values.             imputed values.             imputed values.              imputed values.
                                                                                                                     Topcoding                 Yes                         Yes                         Yes                          Yes                         Yes


                                                                                                                                                                                                                                                                                            THE SIPP PUBLIC USE FILES
                                                                                                                                                                                                                                                                                            THE SIPP PUBLIC USE FILES
                                                                                                                                                                                                                                                                                            THE SIPP PUBLIC USE FILES
                                                                                                                     How to Identify States    TFIPSST                     HSTATE                      TFIPSST                      STATE                       GEO-STE
                                                                                                                     Weight Variables

                                                                                                                     Household                 WHFNWGT                     HWGT
                                                                                                                                                                           H5WGT

                                                                                                                     Family                    WFFINWGT                    FWGT
                                                                                                                     Subfamily                 WSFINWGT                    SWGT

                                                                                                                     Person                    WPFINWGT                    FNLWGT                      WPFINWGT                     FINALWGT                    FNLWGTyy, where yy is
                                                                                                                                                                           P5WGT                                                                                the calendar year
                                                                                                                                                                                                                                                                PNLWGT
                                                                                                                     Metropolitan Areas        TMETRO                      HMETRO                      Not on the file              Not on the file             Not on the file
                                                                                                                                               TMSA
10. Using the Core Wave Files
This chapter discusses procedures for working with data from the core wave public use data files
of the Survey of Income and Program Participation (SIPP). It describes the documentation that
accompanies the core wave public use files obtained from the Census Bureau. Discussion then
turns to the data files themselves. The data file structure is described, and detailed explanations
are provided about how to use the core wave files when performing common tasks, including
(among others):

l   Identifying persons, households, families, and program units;
l   Understanding the effects of topcoding;
l   Using imputation flags; and
l   Identifying states and metropolitan areas.
Before reading this chapter, users should read Chapter 9 for an introduction to Section II.
Analysts using only one core wave file should also read about the use of sample weights
(Chapter 8) and the computation of standard errors (Chapter 7). Those planning on merging data
from multiple core wave files, from full panel files, or from topical module files should read
Chapter 11 for information about the topical module files, Chapter 12 for information about the
full panel files, and Chapter 13 for information about linking SIPP public use files.

This chapter focuses on the core wave files. It is written so that it can be used independently
from the chapters describing the topical module files and the full panel files. Although there are
many similarities across the three types of files, important differences do exist. Because those
differences are sometimes subtle, users familiar with the topical module and full panel files
should read this chapter carefully, paying close attention to information about variable names
and file structures. Table 9-2 summarizes the differences among the core wave, topical module,
and full panel longitudinal research files.

For the 1996 Panel, most variable names changed from those used in previous panels. To aid
users working with files from panels prior to 1996, this chapter presents both the old and the new
variable names when the text applies to both 1996 and pre-1996 panel files. In the main body of
the text, the old names are presented in parentheses following the new names. For example, the
sample unit ID variable name, which is SSUID in the 1996 Panel, was SUID in previous panels;
it is written in this chapter as SSUID (SUID). In tables, a variety of methods are used to present
both the old and the new names.


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                     10-1
SIPP USERS’ GUIDE


Using the Technical Documentation of the
Core Wave Files
Each data file received from the Census Bureau has an accompanying set of technical
documentation and a data dictionary. The technical documentation includes:

l   The item booklet (for the 1996 Panel);
l   The paper survey instrument (for panels prior to the 1996 Panel);
l   A glossary of selected terms;
l   A cross-walk, mapping reference months into calendar months for each rotation group;
l   A source and accuracy statement describing the sample weights and the computation of
    standard errors; and
l   User Notes.
The survey instrument is vital to understanding what questions were asked, how they were asked,
the order in which they were asked, to whom they were asked, and the way in which the answers
were recorded. Some questions employ skip patterns (Chapter 3), so users should pay particular
attention to which questions were skipped for which respondents. The skip patterns are best
understood by consulting the survey instruments. With the introduction of computer-assisted
interviewing (CAI) in the 1996 Panel, documentation of instrument screens and program code is
now available from the SIPP Web site (http://www.sipp.census.gov/sipp/).

The source and accuracy statements provide information about the weights on the files, when
and how to make adjustments to the weights, and one approach to computing standard errors for
some common types of estimates. More extensive discussions of those topics are provided in
Chapters 7 and 8 of this Guide.

The data dictionary provides a detailed description of each variable on the file. It describes four
aspects of each variable:

1. The definition;
2. The sample universe of the corresponding survey question;
3. The ranges for all legal values; and
4. The location (and size) in the file.
A machine-readable version of the data dictionary accompanies each data file. It can also be
downloaded from the Internet (http://www.sipp.census.gov/sipp/).

The data dictionary is formatted to facilitate processing by user-written computer programs. As
shown in Figure 10-1, a “D” in the first column signifies that the next few lines define the
variable: (1) the variable name; (2) the size (i.e., how many digits it contains); and (3) the

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                     10-2
                                                                      USING THE CORE WAVE FILES

starting position. A “U” in the first column signifies that the next words describe the universe.1 A
“V” in the first column indicates that the next number and phrase describe one of the values of
the variable. An asterisk in the first column denotes a comment. A period (.) before a word
denotes the start of the value label. In the dictionaries for files from the 1996 Panel, lines
beginning with a “T” contain short variable descriptions that can be used by many software
packages as variable labels.

              Figure 10-1. Excerpt from a Data Dictionary for the Core Wave Files

                                            Wave 1 of the 1996 Panel
      D EENTAID     3    506
      T PE: Address ID of hhld where person entered
        Sample
           Address ID of the household that this
           person belonged to at the time this
           person first became part of the sample
      U All persons
      V     11:129 .Entry address ID

      D EPPPNUM     4     509
      T PE: Person number
           Person number. This field differentiates
           persons within the sample unit. Person
           number is unique within the sample.
      U All persons
      V   101:1299 .Person number

      D EPPINTVW     2    513
      T PE: Person’s interview status
      U All persons
      V          1 .Interview (self)
      V          2 .Interview (proxy)
      V          3 .Noninterview – Type Z
      V          4 .Nonintrvw = pseudo Type Z.
      V             .Left sample during the
      V             .reference period
      V          5 .Children under 15 during
      V             .reference period
                                                                                             (figure continues)


1
 The universe definitions included in the data dictionaries prior to the 1996 Panel were not always accurate. Users
of pre-1996 SIPP Panels should check the skip patterns in the actual survey questionnaire to determine which subset
of respondents was asked each question.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                      10-3
SIPP USERS’ GUIDE

      Figure 10-1. Excerpt from a Data Dictionary for the Core Wave Files (continued)

                                           Wave 9 of the 1992 Panel
      D ENTRY       2    457
           Edited entry address ID
           Address ID of the household that this
           person belonged to at the time this
           person first became part of the sample
           Range=(11:99)
      U All persons, including children

      D PNUM        3    459
           Edited person number
           Range=(101:998)
      U All persons, including children

      D INTVW       1    462
           Person’s interview status
           Range=(0:5)
      U All persons, including children
      V          0 .Not applicable (children
      V            .under 15)
      V          1 .Interview (self)
      V          2 .Interview (proxy)
      V          3 .Noninterview – Type Z refusal
      V          4 .Noninterview – Type Z other
      V          5 .Noninterview – left before
      V            .interview month


Figure 10-2 shows sample SAS and FORTRAN syntax for reading the data described by the
codebook fragment in Figure 10-1. Additional SAS program code could be used to associate
value labels (SAS “formats”) with the variables.


Relationship of the Core Wave Data Files to the
SIPP Survey Instrument
Because the core wave data dictionary does not replicate the survey instrument, analysts should
keep a few things in mind when using the data:

l   The variables on the data files do not correspond one-to-one with the questionnaire items—
    the variables are listed in a different order, some variables are not included in the core wave
    files at all, and some variables are created from a combination of other variables;


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                     10-4
                                                                            USING THE CORE WAVE FILES

           Figure 10-2. Corresponding SAS and FORTRAN Syntax to Read the Data
                from the Core Wave Files (See Figure 10-1 for Data Dictionary)

                                                 Wave 1 of the 1996 Panel
                                                           SAS
             INPUT
                @506       EENTAID          3.
                           EPPPNUM          4.
                           EPPINTVW         2.
                           ;

             LABEL EENTAID = “Adrs ID where person entered sample”
                   EPPPNUM = “Person number”
                   EPPINTVW = “Person’s interview status”
                   ;

                                                       FORTRAN
                        READ(infile,1000) EENTAID, EPPPNUM, EPPINTVW
             1000       FORMAT(T506,I3,I4,I2))
                                                 Wave 9 of the 1992 Panel
                                                           SAS
             INPUT
                @457       ENTRY       2.
                           PNUM        3.
                           INTVW       1.
                           ;

                  LABEL ENTRY = “Edited Entry Address ID”
                        PNUM = “Edited Person Number”
                        INTVW = “Person’s Interview Status”
                        ;
                                         FORTRAN
                        READ(infile,1000) ENTRY, PNUM, INTVW
             1000       FORMAT(T457,I2,I3,I1)


l   The range of possible values of the variables on the data files does not always correspond
    one-to-one with the response categories shown on the survey instrument or in the data
    dictionary;2


2
  For example, in the 1996 Panel the response categories on the instrument for CLWRK are (1) a government
organization, (2) a private, for-profit company, (3) a nonprofit organization ..., (4) a family business or farm. The
response categories for the corresponding edited variable ECLWRK in the data dictionary are 1 = private for-profit
employee, 2 = private not-for-profit employee, 3 = local government worker, 4 = state government worker, 5 =
federal government worker, 6 = family worker without pay.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                          10-5
SIPP USERS’ GUIDE

l   The variable name in the data dictionary may not readily indicate the variable’s content;3 and
l   The complexity of the skip patterns will not be apparent by simply looking at the data
    dictionary.4
To avoid potential problems and confusion, analysts should become familiar with the survey
instrument before using the data. When working with the data, analysts should refer to both the
survey instrument and the data dictionary.


Structure of the Core Wave Files
Beginning with the 1990 Panel, the core wave files have been issued in person-month format,
with one record per person for each month of the 4-month reference period the person is in the
sample.5 A person who was in the sample for all 4 months of the wave has four records. A
person who was in the sample for 1 month has only one record. Records for persons interviewed
by proxy are included in the files, as are records for persons for whom the data are imputed. The
files also contain records for all children residing with original panel members.

As Table 10-1 illustrates, person number 0101 (101) was in the sample all 4 months, person
number 0102 (102) was also in the sample all 4 months, person number 0201 (201) was in the
sample for 2 months, and person number 0202 (202) was in the sample for 1 month. Users may
find it helpful to review Figure 2-1 (pp. 2-10-2-14), which illustrates movement into and out of
the sample.


Identifying Persons
There are many occasions when a user may need to identify which records belong to which
individual in the SIPP data files. This need arises, for example, when:

l   Merging data from topical module or full panel files to core wave files;
l   Combining data from two or more core wave files;


3
  Although an attempt was made in the 1996 Panel to give all variables meaningful names, the eight-character
limitation imposed by many software packages places severe constraints on the degree to which this can be done.
Prior to the 1996 Panel, the situation was more pronounced since numeric sequencing was used to name variables
(e.g., in the paper survey, SE22318 is the variable that indicates the total number of employees working for the
second business; in CAI, that variable is TEMPB2). In the 1996 Panel, variable names beginning with a “T” have
been topcoded to protect respondent confidentiality.
4
  The universe definitions included in the data dictionaries prior to the 1996 Panel were not always accurate. Users
of pre-1996 SIPP Panels should check the skip patterns in the actual survey questionnaire to determine which subset
of respondents was asked each question.
5
  Prior to the 1990 Panel, core wave files had one record per person. Each record contained four occurrences of each
monthly variable. For more information, see earlier editions of the SIPP Users’ Guide.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                      10-6
                                                                     USING THE CORE WAVE FILES

                Table 10-1. Person-Month File Structure for the Core Wave Files

                                                   1996 Panel
Sample              Current                              Rotation              Reference         Calendar
Unit ID             Address ID         Person Number Group                     Month             Month
(SSUID)             (SHHADID)          (EPPPNUM)         (SROTATION)           (SREFMON)         (RHCALMN)
123451000123        011                0101              2                     1                 2
123451000123        011                0101              2                     2                 3
123451000123        011                0101              2                     3                 4
123451000123        011                0101              2                     4                 5
123451000123        011                0102              2                     1                 2
123451000123        011                0102              2                     2                 3
123451000123        011                0102              2                     3                 4
123451000123        011                0102              2                     4                 5
123451000123        021                0201              2                     1                 2
123451000123        021                0201              2                     2                 3
123451000123        022                0202              2                     4                 5
                                            Prior to the 1996 Panel
Sample              Current            Person             Rotation             Reference         Calendar
Unit ID             Address ID         Number             Group                Month             Month
(SUID)              (ADDID)            (PNUM)             (ROT)                (REFMTH)          (MONTH)
123451000           11                 101                2                    1                 2
123451000           11                 101                2                    2                 3
123451000           11                 101                2                    3                 4
123451000           11                 101                2                    4                 5
123451000           11                 102                2                    1                 2
123451000           11                 102                2                    2                 3
123451000           11                 102                2                    3                 4
123451000           11                 102                2                    4                 5
123451000           21                 201                2                    1                 2
123451000           21                 201                2                    2                 3
123451000           22                 202                2                    4                 5

l   Linking husbands and wives;
l   Linking parents and children; and
l   Identifying which person received government transfer income on behalf of the family.
To uniquely identify a person in the core wave files, analysts should employ the three variables
shown in Table 10-2. Users should note that in the 1996 Panel, the entry address ID is no longer
needed for unique identification. Its continued use will not create any problems; it is simply
redundant information. That is a change from earlier panels in which the entry address ID was
key to uniquely identifying persons.


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                     10-7
SIPP USERS’ GUIDE

       Table 10-2. Variables Used to Uniquely Identify a Person in the Core Wave Files

       Variable Name                    Description
       SSUID (SUID)                     Sample unit ID
       EENTAID (ENTRY)                  Entry address ID (Not required for identification in the 1996 Panel)
       EPPPNUM (PNUM)                   Person number


The variables in Table 10-2 have the following characteristics:

l   SSUID (SUID) uniquely identifies each initially sampled dwelling unit.6 Every person in a
    core wave file was either a member of one of those units (an original sample member) or
    lives with someone who was a member of an initially sampled dwelling unit. A person’s
    connection to that unit is an attribute of that person and does not change over time.7 This
    means that as people move from address to address, their SSUID (SUID) stays the same. As
    new people join the homes of original sample members, they receive the SSUID (SUID) of
    the original sample members.
l   EENTAID (ENTRY) identifies the address where the person lived at the time she or he was
    first interviewed. It does not change even if the person moves.8 Prior to the 1996 Panel, it
    was used in conjunction with the person number and sample unit ID to uniquely identify
    persons within the sampling unit. It is not needed to uniquely identify persons in the 1996
    panel. Values for this variable are unique only within sample units. The entry address ID has
    two components. The first part of the ID number (two digits in the 1992 and 1996 Panels,
    and one digit in all others) identifies the wave in which SIPP interviews were first conducted
    at the address. The second part of the number (one digit in all panels) sequentially numbers
    addresses within a sample unit [SSUID (SUID)] that enter the sample in the same wave. See
    Chapter 9 for a more complete discussion.
l   Prior to the 1996 Panel, PNUM uniquely identified a person within the sample unit and entry
    address ID. In the 1996 Panel, EPPPNUM uniquely identifies a person within the sample
    unit. EPPPNUM (PNUM) does not change even if the person moves.9 The first part of
    EPPPNUM (PNUM) (two digits in the 1992 and 1996 Panels, one digit in all others)
    indicates the wave in which the person was first interviewed.10 The remaining two digits are
    sequentially assigned within the household. Thus, original sample members are assigned
    person numbers ranging from 100 to 199. Individuals who enter the SIPP sample in Wave 2


6
   The SSUID (SUID) is a random recode of three other variables in the Census Bureau’s internal (not public use)
files: the respondent’s sampling area (PSU), the cluster of housing units within that area (called the “segment”), and
a sequentially assigned serial number. Those variables are omitted from the public use files to protect the
confidentiality of the respondents.
7
   There is one rare exception to this rule for Panels prior to 1996, which is described in the section entitled
“Identifying Movers” later in this chapter.
8
  See footnote 6.
9
  See footnote 6.
10
    For Wave 10 of the 1992 Panel and for the 1996 Panel, the first two digits of PNUM instead of the first digit
identify the wave in which the person entered sample.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                       10-8
                                                                      USING THE CORE WAVE FILES

    are assigned a person number ranging from 200 to 299. Those who enter in Wave 10 are
    assigned person numbers ranging from 1001 to 1099.
Table 10-3 illustrates how the combination of SSUID (SUID), EENTAID (ENTRY), and
EPPPNUM (PNUM) uniquely identifies people and provides information about when they first
entered the SIPP sample. In this example, there are eight individuals: five are original sample
members, one person joined the SIPP sample in Wave 3, one joined in Wave 4, and another
joined in Wave 7. Note that the person who joined the sample in Wave 3 (pre-1996 Panel) was
assigned a person number of 301, but an entry address ID of 21 (not 31). That is because the first
part of the entry address ID indicates the wave in which that address was first occupied by any
SIPP sample member, which is not necessarily the wave in which a given member entered the
sample.

             Table 10-3. How to Uniquely Identify a Person in the Core Wave Files

                                                   1996 Panel
                             Entry
       Sample                Address ID      Person Number
       Unit ID (SSUID)       (EENTAID)       (EPPPNUM)                    Notes
       123456789123          011             0101                         Original sample member
       123456789123          011             0102                         Original sample member
       123456789123          022             0301                         Enters SIPP sample in Wave 3
       123456789123          011             0401                         Enters SIPP sample in Wave 4
       123456789123          071             0701                         Enters SIPP sample in Wave 7
       321456789123          011             0101                         Original sample member
       321456789123          011             0102                         Original sample member
       321456789123          011             0103                         Original sample member
                                            Prior to the 1996 Panel
                             Entry
       Sample                Address ID       Person Number
       Unit ID (SUID)        (ENTRY)          (PNUM)                      Notes
       123456789             11               101                         Original sample member
       123456789             11               102                         Original sample member
       123456789             21               301                         Enters SIPP sample in Wave 3
       123456789             11               401                         Enters SIPP sample in Wave 4
       123456789             71               701                         Enters SIPP sample in Wave 7
       321456789             11               101                         Original sample member
       321456789             11               102                         Original sample member
       321456789             11               103                         Original sample member


Identifying Households
The term household, as used in Census Bureau publications, refers to a group of persons who
occupy a housing unit. A house, an apartment or other group of rooms, or a single room is
regarded as a housing unit if it is occupied or intended for occupancy as separate living quarters.
That is, the occupants do not live and eat with any other persons in the structure and there is


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                     10-9
SIPP USERS’ GUIDE

direct access from the outside or through a common hall. A group of friends sharing an
apartment constitutes a household. Noninstitutional group quarters, such as rooming and
boarding houses, college dormitories, convents, and monasteries, are classified as group quarters
rather than households.

To uniquely identify a household or group quarters in the core wave files, analysts should use the
two variables shown in Table 10-4.

                 Table 10-4. Variables Used to Uniquely Identify a Household or
                             Group Quarters in the Core Wave Files

                   Variable Name                      Description
                   SSUID (SUID)                       Sample unit ID
                   SHHADID (ADDID)                    Current address ID


People with the same SSUID (SUID) and SHHADID (ADDID) values live in the same
household (or group quarters). The six individuals in Table 10-5 make up three households. The
first household contains the first four individuals. The second household contains one person.
The third household contains one person.

           Table 10-5. How to Uniquely Identify a Household in the Core Wave Files

                                               1996 Panel
                              Current               Person
   Sample Unit ID             Address ID            Number
   (SSUID)                    (SHHADID)             (EPPPNUM)              Notes
   123456789123               071                   0101                   Four persons in this household
   123456789123               071                   0102
   123456789123               071                   0401
   123456789123               071                   0701
   321456789123               031                   0101                   One person in this household
   321456789123               032                   0102                   One person in this household
                                         Prior to the 1996 Panel
                              Current               Person
   Sample Unit ID             Address ID            Number
   (SUID)                     (ADDID)               (PNUM)                 Notes
   123456789                  71                    101                    Four persons in this household
   123456789                  71                    102
   123456789                  71                    401
   123456789                  71                    701
   321456789                  31                    101                    One person in this household
   321456789                  32                    102                    One person in this household


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                    10-10
                                                                      USING THE CORE WAVE FILES

Each household contains one reference person. The household reference person is the person in
whose name the home is owned or rented. If the house is owned or rented jointly by more than
one person (such as a married couple or some roommate situations), any of those people may be
listed as the “reference person.” Users may find it helpful to refer to Figure 2-1 (pp. 2-10-2-14),
which illustrates the concepts of household and changes in household composition.


Identifying Families
The term family, as used in Census Bureau publications, refers to a group of two or more people
related by birth, marriage, or adoption who reside together; all such individuals are considered
members of one family.

There are several types of families that the Census Bureau distinguishes:

l   A primary family is a family containing the household reference person and all of his or her
    relatives. This means that a household composed of a husband and wife, their son, and their
    son’s wife (i.e., the daughter-in-law) is classified as a primary family containing four people.
l   A related subfamily is a nuclear family that is related to but does not include the household
    reference person. For example, the son and his wife (i.e., the daughter-in-law) in the
    preceding example are a related subfamily.
l   An unrelated subfamily (sometimes called a secondary family) is a nuclear family that is not
    related to the household reference person. Thus, a husband and wife who live in a friend’s
    house are classified as an unrelated subfamily. A mother and daughter who live in the
    mother’s boyfriend’s apartment are classified as an unrelated subfamily.
l   A primary individual is a household reference person who lives alone or lives with only
    nonrelatives. Primary individuals are sometimes treated by the Census Bureau as families
    with only one person and are referred to as pseudo-families.
l   A secondary individual is not a household reference person and is not related to any other
    people in the household. Secondary individuals are sometimes treated by the Census Bureau
    as families with only one person and are referred to as pseudo-families.
To uniquely identify a family, analysts should use the variables shown in Table 10-6.

      Table 10-6. Variables Used to Uniquely Identify a Family in the Core Wave Files

       Variable Name                   Description
       SSUID (SUID)                    Sample unit ID
       SHHADID (ADDID)                 Current Address ID
       and one of the following:
       RFID (FID)                      Family ID
       RFID2 (FID2)                    Family ID, excluding related subfamily members
       RSID (SID)                      Family ID, for both related and unrelated subfamilies


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                     10-11
SIPP USERS’ GUIDE

The Census Bureau has two principal methods for distinguishing families.

l   The first method defines a family as all persons who are related and living together. The
    family ID variable RFID is used with this definition. RFID groups the household reference
    person with all related household members by assigning them the same ID number. This
    family group corresponds to the Census Bureau’s definition of a primary family. RFID
    groups members of each unrelated subfamily (and primary and secondary individuals)
    separately.
l   The second method is similar to the first in defining a family, but the family excludes
    members of related subfamilies. The family ID variable RFID2 is used with this definition.
    RFID2 equals zero for members of related subfamilies. RFID2 groups members of each
    unrelated subfamily (and primary and secondary individuals) in the same way as RFID—
    each group has a unique number.
Analysts who want to analyze multigenerational families would use RFID2 (FID2) and the
variable RSID (SID). RSID (SID) treats related subfamilies as distinct family units by assigning
members of related subfamilies nonzero values. Analysts can easily distinguish unrelated
subfamilies from other family units when they use these variables and numbering schemes.

Table 10-7 illustrates the difference between the RFID (FID), RFID2 (FID2), and RSID (SID)
variables. Those variables are set to new numbers in each month. For example, a mother, a
father, and a child would be family 1 with RFID (FID) = 1 in month 1, RFID (FID) = 2 in month
2, RFID (FID) = 3 in month 3, and RFID (FID) = 4 in month 4, even though family composition
remains the same. The first household in the table contains a primary family of five people. The
primary family contains two related subfamilies. RFID (FID) and RFID2 (FID2) mask the fact
that there are two related subfamilies; only RSID (SID) provides that information: RSID (SID)
has nonzero values for those related subfamilies.

The second “household” is actually a group of three households, each containing a primary
family, that originally formed one household. The third household contains a primary family and
two unrelated subfamilies. The fourth household contains a primary individual and an unrelated
subfamily. The fifth household contains only a primary individual. The sixth household is a
group quarters containing two people.

The needs of the analysis will help to determine which family classification to use. The
following guide may prove helpful:

l   To group people into families in the same way that the Census Bureau does, use SSUID
    (SUID), SHHADID (ADDID), and RFID (FID).
l   To analyze people in related subfamilies, include only those records with RSID (SID) greater
    than zero and ESFTYPE (FTYPE) equal to 2.
l   To analyze all families and to keep subfamilies separate from primary families, use SSUID
    (SUID), SHHADID (ADDID), RFID2 (FID2), and RSID (SID) to uniquely identify each
    family.


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                    10-12
                                                                                                                                                           Table 10-7. Uniquely Identifying Families in the Core Wave Files
        following 1996 variable names.
        When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses


                                                                                                                                                                                           1996 Panel
                                                                                                                                                              Family ID,          Family ID,
                                                                                                                                                              Including           Excluding                                    Related
                                                                                                                      Sample         Current        Person    Related             Related       Related         Family         Subfamily
                                                                                                                      Unit ID        Address ID     Number    Subfamily           Subfamily     Subfamily ID    Type           Type
                                                                                                                      (SSUID)        (SHHADID)      (EPPPNUM) (RFID)              (RFID2)       (RSID)          (EFTYPE)a      (ESFTYPE)      Notes
                                                                                                                      110011111123   011            0101      1                   1             0               1              0              This household contains a
                                                                                                                      110011111123   011            0102      1                   0             2               1              2              primary family of five people.
                                                                                                                      110011111123   011            0103      1                   0             2               1              2              The primary family contains
                                                                                                                      110011111123   011            0104      1                   0             3               1              2              two subfamilies.
                                                                                                                      110011111123   011            0105      1                   0             3               1              2
                                                                                                                      110077777723   011            0101           1              1              0              1              0              Three households formed by
                                                                                                                      110077777723   021            0102           1              1              0              1              0              people who were originally
                                                                                                                      110077777723   021            0103           1              1              0              1              0              members of the same originally
                                                                                                                      110077777723   022            0104           1              1              0              1              0              sampled household (SSUID of
                                                                                                                      110077777723   022            0105           1              1              0              1              0              110077777723). Two
                                                                                                                                                                                                                                              subfamilies split off from the
10-13


                                                                                                                                                                                                                                              original household to become
                                                                                                                                                                                                                                              two new primary families at
                                                                                                                                                                                                                                              addresses 21 and 22.
                                                                                                                      122210000123   011            0101           1              1              0              1              0              This household contains a
                                                                                                                      122210000123   011            0104           1              1              0              1              0              primary family and two
                                                                                                                      122210000123   011            0305           2              2              0              3              0              unrelated subfamilies.


                                                                                                                                                                                                                                                                                USING THE CORE WAVE FILES
                                                                                                                      122210000123   011            0306           2              2              0              3              0
                                                                                                                      122210000123   011            0307           3              3              0              3              0
                                                                                                                      122210000123   011            0308           3              3              0              3              0
                                                                                                                      555555555123   021            0101           1              1              0              4              0              This household contains a
                                                                                                                      555555555123   021            0201           2              2              0              3              0              primary individual and an
                                                                                                                      555555555123   021            0202           2              2              0              3              0              unrelated subfamily.
                                                                                                                      555555555123   021            0203           2              2              0              3              0
                                                                                                                      610000000123 032              0101           1              1              0              4              0              Primary individual.
                                                                                                                      897454644123 011             0101            1               1             0              5                0           Group quarters with two
                                                                                                                      897454644123 011             0102            2               2             0              5                0           secondary individuals.
                                                                                                                      a
                                                                                                                        EFTYPE = 1 means the person belongs to a primary family (including related subfamily members). EFTYPE = 3 means the person belongs to an unrelated
                                                                                                                      subfamily. EFTYPE = 4 means the person is a primary individual. EFTYPE = 5 means the person is a secondary individual.
                                                                                                                                                                                                                                                            (table continues)
                                                                                                                                                   Table 10-7. Uniquely Identifying Families in the Core Wave Files (continued)
        following 1996 variable names.
        When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses


                                                                                                                                                                                                                                                                             SIPP USERS’ GUIDE
                                                                                                                                                                                        Pre-1996 Panel
                                                                                                                                                                  Family ID,     Family ID,
                                                                                                                                                                  Including      Excluding                                   Related
                                                                                                                      Sample        Current        Person         Related        Related       Related        Family         Subfamily
                                                                                                                      Unit ID       Address ID     Number         Subfamily      Subfamily     Subfamily ID   Type           Type
                                                                                                                      (SUID)        (ADDID)        (PNUM)         (FID)          (FID2)        (SID)          (FAMTYP)b      (ESFTYPE)      Notes
                                                                                                                      110011111     11             101            1              1             0              1                             This household contains a
                                                                                                                      110011111     11             102            1              0             2              1                             primary family of five people.
                                                                                                                      110011111     11             103            1              0             2              1                             The primary family contains
                                                                                                                      110011111     11             104            1              0             3              1                             two subfamilies.
                                                                                                                      110011111     11             105            1              0             3              1
                                                                                                                      110077777     011            101            1              1              0             1              0              Three households formed by
                                                                                                                      110077777     021            102            1              1              0             1              0              people who were originally
                                                                                                                      110077777     021            103            1              1              0             1              0              members of the same originally
                                                                                                                      110077777     022            104            1              1              0             1              0              sampled household (SUID of
                                                                                                                      110077777     022            105            1              1              0             1              0              110077777). Two subfamilies
                                                                                                                                                                                                                                            split off from the original
10-14


                                                                                                                                                                                                                                            household to become two new
                                                                                                                                                                                                                                            primary families at addresses
                                                                                                                                                                                                                                            21 and 22.
                                                                                                                      122210000     33             101            1              1              0             1                             This household contains a
                                                                                                                      122210000     33             104            1              1              0             1                             primary family and two
                                                                                                                      122210000     33             305            2              2              0             3                             unrelated subfamilies.
                                                                                                                      122210000     33             306            2              2              0             3
                                                                                                                      122210000     33             307            3              3              0             3
                                                                                                                      122210000     33             308            3              3              0             3
                                                                                                                      555555555     21             101            1              1              0             4                             This household contains a
                                                                                                                      555555555     21             201            2              2              0             3                             primary individual and an
                                                                                                                      555555555     21             202            2              2              0             3                             unrelated subfamily.
                                                                                                                      555555555     21             203            2              2              0             3
                                                                                                                      610000000     11             101            1              1              0             4                             Primary individual.
                                                                                                                      897454644     11           101               1              1              0             5                             Group quarters with two
                                                                                                                      897454644     11           102               2              2              0             5                             secondary individuals.
                                                                                                                      b
                                                                                                                        FAMTYP = 1 means the person belongs to a primary family (including related subfamily members). FAMTYP = 3 means the person belongs to an unrelated
                                                                                                                      subfamily. FAMTYP = 4 means the person is a primary individual. FAMTYP = 5 means the person is a secondary individual.
                                                                       USING THE CORE WAVE FILES


Other Variables Describing Household and
Family Composition
Table 10-8 shows the primary core wave variables summarizing household and family
composition.11

         Table 10-8. Variables Describing Household and Family Composition in the
                                      Core Wave Files

            Variable Name
1996                 Prior to the
Panel                1996 Panel           Description
RHNF                 HNF                  Number of families, subfamilies, and pseudo-families in household
RHNFAM               HNFAM                Number of families and pseudo-families but excluding related
                                          subfamilies in household
RHNSF                 HNSF                Number of related subfamilies in household
EHREFPER              HREFPER             Household reference person (ENTRY concatenated with PNUM)
EHHNUMPP              HNP                 Number of persons in household
RHTYPE                HTYPE               Type of household (e.g., married-couple family, male householder
                                          family, etc.)
EFREFPER              FREFPER             Family reference person (ENTRY concatenated with PNUM)
EFTYPE                FTYPE               Type of family (e.g., primary family, unrelated subfamily, etc.)
EFKIND                FKIND               Head of family (e.g., husband and wife, male reference person, etc.)
ESFT                  FAMTYP              Type of family to which this person belongs (e.g., primary family, related
                                          subfamily, etc.)
ESFRa                 FAMREL              Family relationship (e.g., reference person, spouse of family reference
                                          person, child of family reference person, etc.)
ERRP                  RRP                 Recoded relationship to the household reference person (e.g., household
                                          reference person living with relatives, child of household reference
                                          person, etc.)
Not a variable for RRPU                   Unedited relationship to the household reference person (e.g., stepchild
the 1996 Panel                            of household reference person, grandchild of household reference person,
                                          etc.)
EPNSPOUS              PNSP                Person number of spouse
EPNGUARD              PNGDU               Person number of guardian
EPNMOM                                    Person number of mother
EPNDAD                                    Person number of father
                      PNPT                Person number of parent
a
  ESFR (edited subfamily relationship) is defined the same as FAMREL, but it applies only to subfamilies (both
related and unrelated).


11
  Detailed information about the relationships between members is collected in the Household Relationships topical
module (see Chapter 3 for a discussion of topical module content). See those data for extensive information about
household composition.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                      10-15
SIPP USERS’ GUIDE


Identifying Household and Family Reference Persons

The EHREFPER (HREFPER) variable’s value identifies the household reference person. As
explained in Chapter 2, the household reference person is the owner or renter of record. Prior to
the 1996 Panel, the variable identified the household reference person by concatenating ENTRY
with PNUM. For the 1996 Panel, the variable simply contains the person number of the
household reference person (EHREFPER = EPPPNUM). Prior to the 1996 Panel, the household
reference person was the one for whom:

l   HREFPER = ENTRY * 1000 + PNUM (for Waves 1-9) or
l   HREFPER = ENTRY * 10000 + PNUM (for Wave 10 of the 1992 Panel).

The EFREFPER (FREFPER) variable identifies the family reference person. For the 1996 Panel,
the variable simply contains the person number of the family reference person (EFREFPER =
EPPPNUM). Prior to the 1996 Panel, the family reference person was the one for whom:

l   FREFPER = ENTRY * 1000 + PNUM (for Waves 1-9) or
l   REFPER = ENTRY * 10000 + PNUM (for Wave 10 of the 1992 Panel)


Using the Relationship to Reference Person [ERRP (RRP)]
Variable

For the 1996 Panel, ERRP describes how each person is related to the household reference
person. As seen in Table 10-9, the new variable provides information about several household
relationship categories that were not available from earlier panels. However, as in earlier panels,
this variable summarizes the relationship to the household reference person, not to the family
reference person.

Prior to the 1996 Panel, both edited and unedited versions of the RRP variable were included on
the core wave files. As shown in Table 10-10, RRP (the edited version of the variable)
summarized the values of RRPU (the unedited variable). The RRPU variable can distinguish
whether someone is a grandchild, stepchild, foster child, or natural/adopted child of the
household reference person. What it cannot do, however, is distinguish the type of child within
each family: RRPU is the relationship to the household reference person, not the relationship to
the family reference person. For example, using records with RRPU = 6 will not identify all
foster children, because some could be in an unrelated subfamily. The variable FAMREL
summarizes the relationship of the person to the family reference person (as reference person of
family, spouse, or child).


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                    10-16
                                                                       USING THE CORE WAVE FILES

                  Table 10-9. The ERRP Variable in the 1996 Core Wave Files
                 Edited Relationship to the Household Reference Person (ERRP)

    Edited Relationship to the
    Household Reference
    Person (ERRP)                Description
      1                          Household reference person, living with relatives
      2                          Household reference person, living alone or with nonrelatives
      3                          Spouse of household reference person
      4                          Child of household reference person
      5                          Grandchild of household reference person
      6                          Parent of household reference person
      7                          Brother or sister of household reference person
      8                          Other relative of household reference person
      9                          Foster child of household reference person
     10                          Unmarried partner of household reference person
     11                          Housemate or roommate
     12                          Roomer or boarder
     13                          Other nonrelative of household reference person

         Table 10-10. Comparison of RRP and RRPU Variables of the Core Wave Files
                                   Prior to the 1996 Panel

Edited Relationship                                        Relationship to the
to the Household                                           Household Reference
Reference Person                                           Person
(RRP)                   Description                        (RRPU)                      Notes
1                       Household reference person,         1                          Same as code 1 under RRP
                        living with relatives
2                       Household reference person,         2                          Same as code 2 under RRP
                        living alone or with
                        nonrelatives
3                       Spouse of household reference       3                          Same as code 3 under RRP
                        person
4                       Child of household reference        4                          Natural/adopted child of
                        person                                                         household reference person
                                                            5                          Stepchild of household
                                                                                       reference person
5                       Other relative of household         7                          Grandchild of household
                        reference person                                               reference person
                                                            8                          Parent of household
                                                                                       reference person
                                                            9                          Brother/sister of household
                                                                                       reference person
                                                           10                          Other relative of household
                                                                                       reference person
6                       Nonrelative of household           11                          Same as code 6 under RRP
                        reference person, but related to
                        other members of the
                        household
7                       Nonrelative of all members of       6                          Foster child of household
                        the household                                                  reference person
                                                           12                          Partner/roommate of
                                                                                       household reference person
                                                           13                          Other type of nonrelative of
                                                                                       household reference person

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                       10-17
SIPP USERS’ GUIDE

The ERRP (RRP) variable contains summary information about each person’s relationship to the
household reference person. Analysts should bear in mind that the household description
depends upon the identity of the household reference person. For example, the household in
Table 10-11 contains a mother, her daughter, and her daughter’s son. If the mother is the
household reference person [ERRP = 1 (RRP = 1)], her daughter is listed as a child of the
household reference person [ERRP = 4 (RRP = 4)], and the daughter’s son is listed as a
grandchild of the reference person in the 1996 Panel (ERRP = 5), but as another relative of the
household reference person in earlier panels (RRP = 5, but the same value has a different
meaning from that of the 1996 Panel variable). If the daughter is the reference person, her son is
listed as a child of the household reference person (RRP = 4), and her mother is listed as the
parent of the reference person in the 1996 Panel (ERRP = 6), but as another relative of the
household reference person in earlier panels (RRP = 5).12 Users should note that the identity of
the household reference person can change from one month to the next; thus, the household
description could also change.

Table 10-11. Identifying Households Containing Three Generations in the Core Wave Files

                                               1996 Panel
                             Relationship to Household
Household Member             Reference Person (ERRP)                      Notes
Mother as Household Reference Person
Mother                       1                                            Reference person
Daughter                     4                                            Child of reference person
Daughter’s son               5                                            Grandchild of reference person
Daughter as Household Reference Person
Daughter                     1                                            Reference person
Daughter’s son               4                                            Child of reference person
Mother                       6                                            Parent of reference person
                                         Panels Prior to 1996
                             Relationship to the Household
Household Member             Reference Person (RRP)                       Notes
Mother as Household Reference Person
Mother                       1                                            Reference person
Daughter                     4                                            Child of reference person
Daughter’s son               5                                            Other relative of reference person
Daughter as Household Reference Person
Daughter                     1                                            Reference person
Daughter’s son               4                                            Child of reference person
Mother                       5                                            Other relative of reference person


12
  Because it is impossible to anticipate all of the different living arrangements found in SIPP sample households,
and in some cases more than one rule for identifying a reference person may apply, some interviewer discretion in
identifying the reference person is inevitable. For that reason, the resulting choices can sometimes appear to the data
analyst to be somewhat arbitrary.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                       10-18
                                                                     USING THE CORE WAVE FILES


Identifying a Person’s Spouse, Parent, or Guardian

Four other variables on the core wave files (three prior to the 1996 Panel) can also be used to
describe household and family composition. They are EPNSPOUS (PNSP), EPNDAD or
EPNMOM (PNPT), and EPNGUARD (PNGDU). These variables identify the person number of
the spouse, the father or mother (just one parent is identified in files from panels prior to 1996),
and guardian of the person, respectively. In each case, the relative is identified only if she or he
is living at the same address as the person. By building from these variables, analysts can
identify a variety of family configurations. For example, these variables can be used to identify
households containing three generations. Table 10-12 displays one household containing a
mother and her two children. One child, EPPPNUM = 0102 (PNUM = 0102), has a son, and the
other child, EPPPNUM = 0104 (PNUM = 0104), has a spouse.

Table 10-12. Identifying Households Containing Three Generations in the Core Wave Files

                                                 1996 Panel
                                          Recoded
                                          Relationship
                                          to Household
                          Person          Reference
                          Number          Person           Spouse             Parent
 Household Member         (EPPPNUM) (ERRP)                 (EPNSPOUS)         (EPNMOM)        Notes
 Mother                   0101            1                9999               9999            Mother
 Daughter #1              0102            4                9999               0101            Child
 Daughter #1’s Son        0103            5                9999               0102            Grandchild
 Daughter #2              0104            4                0105               0101            Child
 Spouse of Daughter #2 0105               8                0104               9999            Spouse of child
                                             Panels Prior to 1996
                                          Recoded
                                          Relationship
                          Person          to Household
                          Number          Reference        Spouse             Parent
 Household Member         (PNUM)          Person (RRP) (PNSP)                 (PNPT)          Notes
 Mother                   101             1                999                999             Mother
 Daughter #1              102             4                999                101             Child
 Daughter #1’s Son        103             5                999                102             Grandchild
 Daughter #2              104             4                105                101             Child
 Spouse of Daughter #2 105                5                104                999             Spouse of child
Note: Value of 999 or 9999 means not applicable.


Using Family-Level Income Variables

The core wave files contain a number of family-level income variables. The family income
variables on these files include the income of all related subfamily members. In other words,
primary family members, including related subfamily members, are treated as one family by the
Census Bureau when calculating family-level income amounts. The core wave files also contain


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                    10-19
SIPP USERS’ GUIDE

related subfamily income variables. These variables pool the income of all persons who are
members of the same related subfamily.

Table 10-13 illustrates how the family income variables on the core wave files include the
income of related subfamily members. From the previous example of a primary family of five
people, the primary family contains two related subfamilies. Total family income, TFTOTINC
(FTOTINC), is $4,200. The first related subfamily has a total income, TSTOTINC (STOTINC),
of $1,000. The second related subfamily has $2,000 in total income.


More About Using the SIPP ID Variables:
Identifying Movers
When a person moves, the current address field, SHHADID (ADDID), changes. The SSUID
(SUID), EENTAID (ENTRY), and EPPPNUM (PNUM) values remain the same. The first part
(two digits in the 1992 Panel and the 1996 Panel, one digit in all others) of SHHADID (ADDID)
indicate(s) the wave in which a household is first interviewed at that new address. The remaining
digits sequentially number the households that split into two or more households, as a result of a
move to a different location by original sample members. Thus, new addresses in Wave 2 are
numbered 021 (21), 022 (22), and so on. New addresses in Wave 3 are numbered 031 (31), 032
(32), and so on.

Table 10-14 shows that persons 0101 (101) and 0102 (102) in the first household are original
sample members. Person 0401 (401) moved into the home of persons 0101 (101) and 0102 (102)
in Wave 4. In Wave 7, all three of them moved to a new location and were joined by person 0701
(701). In the second household, person 101 is an original sample member who moved to a new
location in Wave 3. In the third household, person 0102 (102) is an original sample member who
used to live with persons 0101 (101) and 0103 (103) of the same sample unit ID, but moved to a
new location in Wave 3 [to a different location from person 0101 (101)]. In the fourth household,
person number 0103 (103) is an original sample member who used to live with persons 0101
(101) and 0102 (102) of the same sample unit ID number. All but two people moved from their
original location [i.e., only two people have SHHADID (ADDID) equal to EENTAID
(ENTRY)].

The next example (Table 10-15) further illustrates how the ID system works as people move to
new addresses, additional people move in with them, and households split. A review of Figure
2-1 may help in understanding the various household changes.

l   In Wave 1, there is a five-person household consisting of a husband, wife, daughter, son, and
    cousin. Since this is the first wave, the current address number is 011 (11), indicating address
    1 of Wave 1, and the entry address number for each member of the household is the same as
    the current address number. Since they are assigned in Wave 1, the person numbers are in the
    0100 (100) series and are numbered sequentially, beginning with 0101 (101).


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                    10-20
        following 1996 variable names.
        When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses


                                                                                                                                      Table 10-13. How the Family-Level Variables Include the Subfamily’s Information in the Core Wave Files

                                                                                                                                                                                         1996 Panel
                                                                                                                                                                                                                          Number of    Total
                                                                                                                                                                  Family ID,                  Number of      Total        Persons in   Related      Total Primary
                                                                                                                      Sample            Current         Person    Including                   Persons in     Family       Related      Subfamily    Family Income
                                                                                                                      Unit ID           Address ID      Number    Subfamily     Subfamily Family             Income       Subfamily    Income       Net of Related
                                                                                                                      (SSUID)           (SHHADID)       (EPPPNUM) (RFID)        ID (RSID) (EFNP)             (TFTOTINC)   (EFNP)       (TSTOTINC)   Subfamily
                                                                                                                      110011111123      11              0101      2             0             5              $4,200       0            $0           $1,200
                                                                                                                      110011111123      11              0102      2             2             5              $4,200       2            $1,000       NA
                                                                                                                      110011111123      11              0103      2             2             5              $4,200       2            $1,000       NA
                                                                                                                      110011111123      11              0104      2             3             5              $4,200       2            $2,000       NA
10-21


                                                                                                                      110011111123      11              0105      2             3             5              $4,200       2            $2,000       NA
                                                                                                                                                                                   Prior to the 1996 Panel
                                                                                                                                                                                                                          Number of    Total
                                                                                                                                                                   Family ID,                Number of       Total        Persons in   Related      Total Primary
                                                                                                                       Sample          Current          Person     Including                 Persons in      Family       Related      Subfamily    Family Income


                                                                                                                                                                                                                                                                     USING THE CORE WAVE FILES
                                                                                                                       Unit ID         Address ID       Number     Subfamily    Subfamily    Family          Income       Subfamily    Income       Net of Related
                                                                                                                       (SUID)          (ADDID)          (PNUM)     (FID)        ID (SID)     (FNP)           (FTOTINC)    (SNP)        (STOTINC)    Subfamily
                                                                                                                       110011111       11               101        2            0            5               $4,200       0            $0           $1,200
                                                                                                                       110011111       11               102        2            2            5               $4,200       2            $1,000       NA
                                                                                                                       110011111       11               103        2            2            5               $4,200       2            $1,000       NA
                                                                                                                       110011111       11               104        2            3            5               $4,200       2            $2,000       NA
                                                                                                                       110011111       11               105        2            3            5               $4,200       2            $2,000       NA
                                                                                                                      Note: NA equals not applicable.
SIPP USERS’ GUIDE

                      Table 10-14. Identifying Movers in the Core Wave Files

                                                   1996 Panel
  Sample            Current          Entry           Person
  Unit ID           Address ID       Address ID      Number
  (SSUID)           (SHHADID)        (EENTAID)       (EPPPNUM)        Notes
  123456789123      071              011             0101             Persons 0101 and 0102 are the original
  123456789123      071              011             0102             sample members. Person 0401 begins to
  123456789123      071              011             0401             live with them in Wave 4. All three
  123456789123      071              071             0701             people move in Wave 7 and person 0701
                                                                      joins them.
  321456789123      031              011             0101             Person 0101 is an original sample
                                                                      member who moved in Wave 3.
  321456789123      032              011             0102             Person 0102 is an original sample
                                                                      member who moved in Wave 3 to a
                                                                      different location from person 0101.
                                           Prior to the 1996 Panel
  Sample            Current          Entry          Person
  Unit ID           Address ID       Address ID     Number
  (SUID)            (ADDID)          (ENTRY)        (PNUM)            Notes
  123456789         71               11             101               Persons 101 and 102 are the original
  123456789         71               11             102               sample members. Person 401 begins to
  123456789         71               11             401               live with them in Wave 4. All three
  123456789         71               71             701               people move in Wave 7 and person 701
                                                                      joins them.
  321456789         31               11              101              Person 101 is an original sample member
                                                                      who moved in Wave 3.
  321456789         32               11              102              Person 102 is an original sample member
                                                                      who moved in Wave 3 to a different
                                                                      location from person 101.


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                    10-22
                                                                     USING THE CORE WAVE FILES

           Table 10-15. Example of Household Changes and Their Effects on the ID
                             Variables of the Core Wave Files

                                                   1996 Panel
                         Sample                  Current               Entry              Person
     Household           Unit ID                 Address ID            Address ID         Number
     Members             (SSUID)                 (SHHADID)             (EENTAID)          (EPPPNUM)
     Wave 1
     Father              101111103123            011                   011                0101
     Mother              101111103123            011                   011                0102
     Daughter            101111103123            011                   011                0103
     Son                 101111103123            011                   011                0104
     Cousin              101111103123            011                   011                0105
     Wave 2
     Father              101111103123            011                   011                0101
     Mother              101111103123            011                   011                0102
     Daughter            101111103123            011                   011                0103
     Son                 101111103123            011                   011                0104
     Cousin              101111103123            011                   011                0105
     Wave 3
     Father              101111103123            011                   011                0101
     Mother              101111101233            011                   011                0102
     Daughter            101111103123            011                   011                0103
     Son-in-Law          101111103123            011                   011                0301
     Cousin              101111103123            011                   011                0105
     Wave 4              Parent’s Household
     Father              101111103123            011                   011                0101
     Mother              101111103123            011                   011                0102
                         Daughter’s Household
     Daughter            101111103123            041                   011                0103
     Son-in-Law          101111103123            041                   011                0301
                         Cousin’s Household
     Cousin              101111103123            042                   011                0105
     Uncle               101111103123            042                   042                0401
     Wave 10             Parent’s Household
     Father              101111103123            011                   011                0101
     Mother              101111103123            011                   011                0102
                         Daughter’s Household
     Daughter            101111103123            101                   011                0103
     Son-in-Law          101111103123            101                   011                0301
     Newborn             101111103123            101                   041                1001
                                                                                            (table continues)


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                       10-23
SIPP USERS’ GUIDE

           Table 10-15. Example of Household Changes and Their Effects on the ID
                         Variables of the Core Wave Files (continued)

                                              Panels Prior to 1996
                         Sample                 Current                Entry              Person
     Household           Unit ID                Address ID             Address ID         Number
     Member              (SUID)                 (ADDID)                (ENTRY)            (PNUM)
     Wave 1
     Father              101111103               11                    11                 101
     Mother              101111103               11                    11                 102
     Daughter            101111103               11                    11                 103
     Son                 101111103               11                    11                 104
     Cousin              101111103               11                    11                 105
     Wave 2
     Father              101111103               11                    11                 101
     Mother              101111103               11                    11                 102
     Daughter            101111103               11                    11                 103
     Son                 101111103               11                    11                 104
     Cousin              101111103               11                    11                 105
     Wave 3
     Father              101111103              11                   11                 101
     Mother              101111103              11                   11                 102
     Daughter            101111103              11                   11                 103
     Son-in-Law          101111103              11                   11                 301
     Cousin              101111103              11                   11                 105
     Wave 4              Parent’s Household
     Father              101111103              11                   11                 101
     Mother              101111103              11                   11                 102
                         Daughter’s Household
      Daughter           101111103              41                   11                 103
      Son-in-Law         101111103              41                   11                 301
                          Cousin’s Household
      Cousin             101111103              42                   11                 105
      Uncle              101111103              42                   42                 401
      Wave 10a            Parent’s Household
      Father             101111103              11                   11                 101
      Mother             101111103              11                   11                 102
                          Daughter’s Household
      Daughter           101111103              41                   11                 103
      Son-in-Law         101111103              41                   11                 301
      Newborn            101111103              41                   41                 1001
    a
      Prior to the 1996 Panel, only the 1992 Panel had 10 or more waves. The Wave 2 core wave file of the
    1992 Panel has expanded address ID and person ID fields (3 and 4 digits, respectively) to accommodate
    Wave 10 of the 1992 Panel.


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                      10-24
                                                                     USING THE CORE WAVE FILES

l    During Wave 2, the son joins the Army, moves into the military barracks, and therefore
     leaves the SIPP sample. For the son’s record, person number 0104 (104), the person-month
     file, will contain a Wave 1 record for him and a Wave 2 record containing information (either
     imputed or provided by proxy) on his characteristics in the months of Wave 2 that he was
     still in the sample. If he does not return to the sample during the remainder of the panel, there
     will be no records for him beyond Wave 2.
l    During Wave 3, the daughter marries and her husband moves into the household. The current
     address number where the mother, father, cousin, daughter, and son-in-law live remains the
     same since it is the same address. The son-in-law’s entry address number is 011 (11), since
     he first enters the SIPP sample at an address coded 011 (11). The person number for the son-
     in-law is in the 0300 (300) series [0301 (301)] since he joins the SIPP sample in Wave 3.
l    During Wave 4, the daughter and son-in-law move into a new house. Their current address
     number changes to 041 (41) to indicate that a new address has been established in Wave 4.
     Meanwhile, the cousin, who is over age 15, moves in with an uncle.13 The cousin’s current
     address number changes to 042 (42) (i.e., the second new household formed in the fourth
     wave from this sample unit). The assignment of address number 041 (41) to the daughter and
     2 (42) to the cousin is arbitrary—it could be the other way around. The uncle enters the SIPP
     sample and receives an address number of 042 (42) and an entry address number of 042 (42).
     The uncle’s person number is in the 0400 (400) series [0401 (401)], since he joins the survey
     in Wave 4.
l    No changes in household composition are observed during Waves 5–9.
l    During Wave 10,14 the daughter and son-in-law have a baby. This new sample member is
     assigned the sample unit ID of the daughter and son-in-law. The newborn’s entry address is
     041 (41) because that is the current address ID of the daughter and son-in-law at the time of
     birth. The newborn’s person number is 1001, reflecting the fact that the newborn came into
     the SIPP sample in Wave 10. Meanwhile, the cousin moves to Europe and therefore leaves
     the SIPP sample. The uncle, even though he did not move to Europe with the cousin, also
     leaves the SIPP sample because he no longer resides with an original SIPP sample member.
     Their records are no longer listed.
Prior to the 1996 Panel, there were two extremely rare occasions when the original SUID,
ENTRY, and PNUM values were modified by the Census Bureau:

1. The first occasion was when two separate sampling units, each containing original sample
   members, were merged, perhaps because of a marriage. In this situation, one of the original
   sets of SUID and ENTRY values was retained and the other set was changed to agree with
   that retained set. The person-number values (PNUM) of the changed set were modified
   further to be between 180 and 199, inclusive.


13
   In the 1993 Panel, all original sample members were followed, no matter what their age. In all other panels
(including the 1996 Panel), only those age 15 or older were followed when they moved to new addresses.
14
   Prior to the 1996 Panel, only the 1992 Panel had 10 or more waves.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                    10-25
SIPP USERS’ GUIDE

2. The second occasion was when a household split into two new households (in which each
   new household gained a new sample person) and later the households recombined. For
   example, suppose that a married couple separated in Wave 3, each moving in with a sibling.
   Both siblings were assigned a person number of 301 because they entered the sample in
   Wave 3 at different addresses (thus, ADDID = 31 and 32). If the husband and wife reunited
   in Wave 6, bringing the siblings with them, one sibling’s person number would have been
   changed. In this case, one of the siblings would have a person number of 301 and the other
   would have a person number of 680 (or some number between 680 and 699, inclusive).
Those two occasions were the only times when SUID, ENTRY, and PNUM changed. When it
did occur, the old ID variables were stored in the previous wave variables (PWSUID,
PWENTRY, and PWPNUM).15

When the merge occurred after the first month of a reference period, the members of the merged
household (whose ID variables were modified) were assigned two sets of monthly records in the
core wave file. The first set of records contained the original ID information and identified the
person as having exited the sample at the time of the merge. The second set contained the new
ID information and identified the person as having entered the sample at the time of the merge.
When the merge occurred at the start of the reference period, only the second set of records was
retained in the core wave files.

Because merged households were very rare prior to the 1996 Panel, information about them will
no longer be carried on the core wave files from the 1996 Panel. When either of those two kinds
of events occur in the 1996 Panel, one or more original sample members will appear to leave the
sample when the merge takes place, and new people will appear to enter the sample when the
merged household forms. There is no indication in the data files that the “new” sample members
were previously members of the SIPP sample with different ID values.


Identifying Program Units
Besides household and family composition, the core wave files contain detailed information
about participation in health insurance and various government transfer programs. For most
programs, three characteristics are recorded (Table 10-16):

1. Whether the person is covered;
2. Who received the income or benefit; and
3. The amount of the income or benefit.


15
  In the 1993 Panel, merged households are identified with the variables PWSUID, PWENTRY, and PWPNUM.
Before the 1993 Panel, they were identified with the variables PREV-ID, SC0064, and SC0066.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                    10-26
                                                                     USING THE CORE WAVE FILES

  Table 10-16. Variables Describing Participation in Government Transfer Programs and
                    Health Insurance Programs in the Core Wave Files

                                                   1996 Panel
                                                                 Authorized
Program                                      Coverage            Recipient          Recipiency     Amount
Social Security—Adults                       RCUTYP01            RCUOWN01           ER01A          T01AMTA
Social Security—Children                                                            ER01K          T01AMTK
Railroad Retirement—Adults                                                          ER02           T02AMT
Federal Supplemental Security Income         RCUTYP03            RCUOWN03           ER03           T03AMT
Veteran’s Benefits                           RCUTYP08            RCUOWN08           ER08           T08AMT
Aid to Families with Dependent Children/     RCUTYP20            RCUOWN20           ER20           T20AMT
Temporary Assistance for Needy Familiesa
General Assistance                           RCUTYP21          RCUOWN21             ER21           T21AMT
Foster Child Care                            RCUTYP23          RCUOWN23             ER23           T23AMT
Other Welfare                                RCUTYP24          RCUOWN24             ER24           T24AMT
Women, Infants and Children (WIC)            RCUTYP25          RCUOWN25             ER25           T25AMT
Food Stamps                                  RCUTYP27          RCUOWN27             ER27           T27AMT
Medicare                                                       ECRMTH
Medicaid                                     RCUTYP57          RCUOWN57             ER57
CHAMPUS                                                        RCHAMPM
Other Health Insurance                       RCUTYP58          RCUOWN58             ER58
                                             Panels Prior to 1996
                                                               Authorized
Program                                      Coverage          Recipient             Recipiency     Amount
Social Security—Adults                       SOCSEC            SSPNUM                 R01A          S01AMTA
Social Security—Children                                                              R01K          S01AMTK
Railroad Retirement—Adults                      RAILRD             RRPNUM             R02A          S02AMTA
Railroad Retirement—Children                                                          R02K          S02AMTK
Federal Supplemental Security Income            SSICOVRGb                             R03           S03AMT
Veteran’s Benefits                              VETS               VETNUM             R08           S08AMT
Aid to Families with Dependent Children         AFDC               AFDCPNUM           R20           S20AMT
General Assistance                              GENASST            GAPNUM             R21           S21AMT
Foster Child Care                               FOSTKID            FKPNUM             R23           S23AMT
Other Welfare                                   OTHWELF            OWPNUM             R24           S24AMT
Women, Infants and Children (WIC)               WICCOV             WICPNUM            R25           WICVAL
Food Stamps                                     FOODSTMP           FSPNUM             R27           S27AMT
Medicare                                        CARECOV
Medicaid                                        CAIDCOV            MCDPNUM
CHAMPUS                                         CHAMP              CHPNUM
Other Health Insurance                          HIIND              HIPNUM
a
  In August 1996, the Personal Responsibility and Work Opportunity Reconciliation Act was signed into law. This
legislation replaced the old welfare system, Aid to Families with Dependent Children (AFDC), with a new program,
Temporary Assistance for Needy Families (TANF). In the 1996 Panel, the questions for income type 20 referred to
the AFDC program prior to Wave 4 and to the TANF program beginning in Wave 4. In Wave 9, the questions were
expanded somewhat to capture the larger array of program types that could exist under TANF.
b
  During the 1990s, SSI was extended to children with disabilities. Consequently, beginning with the 1992 Panel,
SSICOVRG was added to the core wave data files.


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                    10-27
SIPP USERS’ GUIDE

The coverage variables identify whether the income or benefit covers that person. In other words,
when a person is flagged as covered by food stamps, RCUTYP27 (FOODSTMP) = 1, the person
received the benefits either directly (because he or she was the authorized food stamp recipient)
or indirectly (because he or she was in the same food stamp unit as the authorized recipient). The
coverage variables also allow users to determine situations in which the program unit is a subset
of the family or household.16

The authorized recipient variables identify the people who actually received the income or
benefit for the people in their program units. In the 1996 Panel, the variables identifying the
authorized recipient use only the person number, EPPPNUM. Prior to the 1996 Panel, the
variables identifying the authorized recipient were constructed by concatenating the entry
address, ENTRY, with the person number, PNUM.

Individuals who are members of a common program unit can be identified by using the sample
unit ID, SSUID (SUID), and the authorized recipient variable. For example, members of a
common food stamp unit are those with common values of SSUID (SUID) and RCUOWN27
(FSPNUM). Identifying members of common units is often necessary because most programs
allow more than one program unit in a household. Medicare, however, is a person-based program
in which each participant is an authorized recipient, so no additional authorized recipient variable
for that program is included on the files. Prior to the 1996 Panel, there was also no authorized
recipient variable for SSI on the core wave files.

There are some exceptions to these rules:

l    Social Security, Railroad Retirement (prior to 1996), WIC, AFDC, and Medicaid can offer
     benefits solely to children. When that happens, an adult receives the income on behalf of the
     children. The adult, therefore, is flagged as the authorized recipient but is not flagged as
     covered by the program. The children are flagged as covered and have nonzero benefits.
l    Most SSI recipients are elderly and disabled adults, but they can also be disabled children. In
     the 1990s, the definition of qualifying disabling conditions was expanded. That change in
     definition resulted in a rapid expansion of the child SSI caseload. Consequently, the
     SSICOVRG variable was included (beginning with the 1992 Panel). This variable indicates
     on the recipient’s (the adult’s) record whether the children, the adults, or both, within a
     family are covered by the income. Prior to the 1996 Panel, however, SSICOVRG did not flag
     each person individually, like the other coverage variables. Only the recipient will have had a
     nonzero SSI income. Beginning with the 1996 Panel, two new variables identify each
     individual covered by federally administered SSI (RCUTYP03) or state-administered SSI
     (RCUTYP04).


16
  In the 1984 and 1985 Panels, WIC coverage was imputed to children under 6 years old if a mother reported
participation in the WIC program. Beginning with the 1986 Panel, WIC coverage is assessed directly for all sample
members.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                    10-28
                                                                     USING THE CORE WAVE FILES

l   The medical insurance variables simply reflect who is enrolled in which type of program.
    There are no associated amount variables.
These rules and exceptions are illustrated in Table 10-17. The household contains one AFDC
unit and two food stamp units. The mother is covered by Social Security and SSI. The mother of
the disabled child receives WIC benefits and SSI on behalf of her child, but she did not receive
WIC or SSI for herself. Everyone in the household is enrolled in Medicaid. The coverage
variables are set to 2 whenever the person is not covered by the particular program; the one
exception (for panels prior to 1996) is SSI coverage—a value of 2 means that only the children
are covered.

Users should note that, except for WIC, no amounts of income or benefit from government
transfer and health insurance programs are listed in the records of children under age 15. Thus, in
the case of WIC, users need to sum the amounts over all persons, including children, to get the
proper WIC unit total. For all other programs, users will find the unit total benefit in the
recipient’s record.


Income Topcoding in the 1996 Panel
To protect the confidentiality of SIPP respondents, the Census Bureau topcodes very high
incomes on the SIPP public use data files. New income topcoding procedures were instituted
with the 1996 Panel. As in the past, summary income variables for persons, families, and
households are the sums of the component variables after they have been topcoded. The
summary variables are not independently topcoded. Thus, a person, family, or household with
high income from several sources (multiple jobs, businesses, property) could have aggregate
monthly income well over the topcode threshold for each source.


Topcoding Unearned Income in the 1996 Panel

When the total amount of asset income or of certain types of general income for a wave exceeds
the established ceiling, the monthly amounts in excess of the monthly threshold are replaced by
monthly topcode values. For example:

l   When the amount of interest on joint municipal/corporate bonds exceeds $10,000 for the
    wave, each monthly amount in excess of $2,500 is recoded to $2,500.
l   When the amount of interest on self-owned municipal/corporate bonds exceeds $12,800 for
    the wave, each monthly amount in excess of $3,200 is recoded to $3,200.


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                    10-29
SIPP USERS’ GUIDE

               Table 10-17. Example of Program Units, Coverage, and Recipiency
                                   in the Core Wave Files

                                                    1996 Panel
                                                                                               Daughter #2’s
                                                Daughter #1’s                    Spouse of     Pregnant
                  Mother       Daughter #1      Son           Daughter #2        Daughter #2   Daughter
EPPPNUM           0101         0102             0103          0104               0105          0106
TAGE              70           21               4             35                 36            16
AFDC/TANF
RCUTYP20          2            1                1                2               2             2
RCUOWN20          0            0102             0102             0               0             0
ER20              0            1                0                0               0             0
T20AMT            0            123              0                0               0             0
Food Stamps
RCUTYP27          2            1                1                1               1             1
RCUOWN27          0            0102             0102             0104            0104          0104
ER27              0            1                0                1               0             0
T27AMT            0            160              0                130             0             0
SSI
RCUTYP03          1            2                1                0               0             0
ER03              1            1                0                0               0             0
T03AMT            188          122              0                0               0             0
WIC
RCUTYP25          2            2                1                2               2             1
RCUOWN25          0            0                0102             0               0             0106
ER25              0            1                0                0               0             1
WICVAL            0            30.12            0                0               0             27.50
Medicaid
RCUTYP57          1            1                1                1               1             1
RCUOWN57          0101         0102             0102             0104            0104          0106
Social Security
RCUTYP01A         1            2                2                2               2             2
RCUOWN01A         0101         0                0                0               0             0
R01A              1            0                0                0               0             0
T01AMTA           470          0                0                0               0             0
                                                                                                   (table continues)


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                     10-30
                                                                     USING THE CORE WAVE FILES

               Table 10-17. Example of Program Units, Coverage, and Recipiency
                              in the Core Wave Files (continued)

                                              Panels Prior to 1996
                                                                                               Daughter #2’s
                                                Daughter #1’s                    Spouse of     Pregnant
                  Mother      Daughter #1       Son           Daughter #2        Daughter #2   Daughter
PNUM              101         102               103           104                105           106
AGE               70          21                4             35                 36            16
AFDC
AFDCCOV           2           1                 1               2                2             2
AFDCPNUM          0           11102             11102           0                0             0
R20               0           1                 0               0                0             0
S20AMT            0           123               0               0                0             0
Food Stamps
FOODSTMP          2           1                 1               1                1             1
FSPNUM            0           11102             11102           11104            11104         11104
R27               0           1                 0               1                0             0
S27AMT            0           160               0               130              0             0
SSI
SSICOVRG          1           2                 1               0                0             0
R03               1           1                 0               0                0             0
S03AMT            188         122               0               0                0             0
WIC
WICCOV            2           2                 1               2                2             1
WICPNUM           0           0                 11102           0                0             11106
R25               0           1                 0               0                0             1
WICVAL            0           30.12             0               0                0             27.50
Medicaid
CAIDCOV           1           1                 1               1                1             1
MCDPNUM           11101       11102             11102           11104            11104         11106
Social Security
SOCSEC            1           2                 2               2                2             2
SSPNUM            11101       0                 0               0                0             0
R01A              1           0                 0               0                0             0
R01K              0           0                 0               0                0             0
S01AMTA           470         0                 0               0                0             0
S01AMTK           0           0                 0               0                0             0


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                    10-31
SIPP USERS’ GUIDE

Not all income sources are topcoded. For example, the amount of food stamp income is not
topcoded. For a complete list of topcoded income variables with the topcode amounts for the
1996 Panel, users should refer to Appendix B (Topcoding).


Topcoding Employment Income in the 1996 Panel

Three different sources of monthly employment income are identified in the SIPP public use
files: (1) wage and salary income, (2) self-employed earnings, and (3) other worker
arrangements. Each of these three sources is topcoded separately. For each source, monthly
amounts over $12,500 (one-twelfth of the $150,000 annual benchmark) are topcoded if the total
income from those sources from all 4 months in the wave is greater than $50,000 (one-third of
$150,000). Table 10-18 provides examples of employment income amounts that require
topcoding.

                         Table 10-18. Topcoding Criteria for the 1996 Panel

              Reported Monthly Earned Income Amounts                           Is the Sum
                                                                Sum for the    Greater than    Topcoding
Example     Month 1     Month 2     Month 3       Month 4       Wave           $50,000?        Procedure
1           $ 3,000     $ 4,000     $ 5,000       $ 5,000       $17,000        No              None
2           $0          $0          $0            $55,000       $55,000        Yes             Topcode month 4
3           $15,000     $10,000     $10,000       $12,000       $52,000        Yes             Topcode month 1
4           $12,000     $15,000     $15,000       $15,000       $60,000        Yes             Topcode months
                                                                                               2, 3, and 4
5           $0          $0          $0            $49,000       $49,000        No              None
6           $15,000     $15,000     $15,000       $15,000       $60,000        Yes             Topcode all 4


When topcoding is required because the reported value exceeds the acceptable threshold, the
value assigned to the variable can be determined in one of two ways: it can be set equal to the
threshold, or it can be set equal to the mean of the reported amounts above the threshold. In the
second case, the topcode value that is assigned is based on the respondent’s gender, race/ethnic
origin, and employment status (full or part year, full or part time). Table 10-19 illustrates the
procedure. It shows the topcodes used in Wave 1 of the 1996 Panel for employment income.
Those Wave-1-based topcodes are adjusted for inflation and real growth in earned income (see
Box 10-1) and then used for all later waves of the panel.

Because of the way in which the topcode values were computed (explained in the next
paragraph), the values listed for each cell are greater than the monthly value that is tested
($12,500). This method of computation may result in instances in which use of the topcode
values results in total amounts for the wave (summed across all 4 months) that are greater than
$50,000.


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                    10-32
                                                                     USING THE CORE WAVE FILES

           Table 10-19. Topcode Amounts Used for Monthly Employment Income in
                                 Wave 1 of the 1996 Panel

                                                                                              Earned Income
Example      Sex           Race                              Worker Status                    Topcode
 1           Male          Nonblack, non-Hispanic            Full year; full time             $29,660
 2           Male          Nonblack, non-Hispanic            Not full year; full time         $38,270
 3           Male          Black, non-Hispanic               Full year; full time             $17,530
 4           Male          Black, non-Hispanic               Not full year; full time         $24,015
 5           Male          Hispanic, any race                Full year; full time             $26,250
 6           Male          Hispanic, any race                Not full year; full time         $24,015
 7           Female        Nonblack, non-Hispanic            Full year; full time             $21,990
 8           Female        Nonblack, non-Hispanic            Not full year; full time         $49,450
 9           Female        Black, non-Hispanic               Full year, full time             $24,015
10           Female        Black, non-Hispanic               Not full year; full time         $24,015
11           Female        Hispanic, any race                Full year; full time             $24,015
12           Female        Hispanic, any race                Not full year; full time         $24,015


                   Box 10-1. Computing Earned Income Topcode Amounts for
                                Waves 2–12 in the 1996 Panel


   The topcode amount for wave k is computed as:
   Topcode Wave k = Topcode Wave 1 * 1.019 k −1
   Example: Nonblack, non-Hispanic male employed full year, full time.
   Wave 1 Topcode (from Table 10-19) = $29,660
   Wave 7 Topcode = $29,660 * 1.019(7-1) = $29,660 * 1.120 = $32,206


The topcode values were computed from data collected in Wave 1 of the 1996 Panel. The
topcode values are the unweighted mean amounts from records identified for topcoding in Wave
1 of the 1996 Panel. A separate topcode value was computed for each of the 12 cells of Table 10-
19. Each topcode value is based on amounts from all three employment income sources, and the
same topcode is used for all three employment income sources. The algorithm used to calculate
the assigned topcode amount is as follows:

1. Add the four monthly amounts of wage and salary income. If the sum is greater than
   $50,000, store the monthly amounts greater than $12,500 in the 12-cell matrix.
2. Add the four monthly amounts of self-employed earnings. If the sum is greater than $50,000,
   store the monthly amounts greater than $12,500 in the 12-cell matrix.
3. Add the four monthly amounts of contingent worker earnings. If the sum is greater than
   $50,000, store the monthly amounts greater than $12,500 in the 12-cell matrix.


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                    10-33
SIPP USERS’ GUIDE

On the basis of the amounts accumulated, compute a mean amount within each of the 12 cells of
the matrix. That mean amount is the topcode value shown in Table 10-19.

The amounts shown in Table 10-19 were computed with data from Wave 1. Current plans call
for using these amounts, adjusted for inflation and real growth in earned income by 1.019
percent per wave for all remaining waves of the 1996 Panel. This is equivalent to an annual
increase of 5.8 percent. The mean amounts will not be recomputed from microdata for later
waves. The formula to compute the topcode amounts for earned income in later waves is shown
in Box 10-1.

The following three examples and Table 10-20 illustrate employment income topcoding:

l   A black male software consultant works full time for the entire year and reports an annual
    salary of $196,600. His salary income varies from month to month, however, sometimes
    dramatically. For this wave, it is $57,100, above the first test of $50,000. The earned income
    topcode value for black males who work full time, full year is $17,530 (see Table 10-19:
    example 3, last column). That value will be used instead of the consultant’s reported monthly
    earned income for the 1 month in which his earned income exceeded $12,500.
l   A Hispanic female attorney normally works full time, the full year, with an annual income of
    about $300,000. In the middle of this wave, she has returned from a 6-month maternity leave;
    for the first 2 months of the wave, she has no earned income. Her income for the wave in
    question is $51,000, just over the threshold value of $50,000. The earned income topcode
    value for Hispanic women who work full time, full year is $24,015 (see Table 10-19:
    example 11, last column). That is the value that will be used as the attorney’s monthly earned
    income for the months in which her income exceeds $12,500.
l   A white male psychiatrist spends the month of August at his beach house. While on vacation,
    he has no earned income. When he returns to the city in September his income returns to its
    usual level of $20,000 for the next 3 months. His income for the wave is $60,000, exceeding
    the $50,000 threshold. The earned income topcode for nonblack, non-Hispanic males is
    $38,270 (see Table 10-19: example 2, last column). That value is used for the 3 months the
    psychiatrist reported income over $12,500, resulting in a total earned income for the wave of
    $114,810. That total, after topcoding, is substantially higher than $50,000.
l   A white television actress does not work during her series’ hiatus. When the series is in
    production, she works full time. Her annual earned income is $880,000; her income for the
    wave in question is $160,000. She has earned nothing in the first 3 months of the wave, and
    $160,000 for the fourth month. The SIPP matrix topcode for nonblack, non-Hispanic women
    who work full time but less than full year is $49,450 for each month (see Table 10-19:
    example 8, last column). That value will be assigned for the 1 month of the wave in which
    the actress reported earned income.


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                    10-34
                                                                      USING THE CORE WAVE FILES

          Table 10-20 Example of Employment Income Topcoding in the 1996 Panel

Worker                                          Reported Monthly Income Amounts                  Sum for the
Characteristics       Income          Month 1       Month 2      Month 3      Month 4            Wave
Black, non-Hispanic    Reported      $10,000       $10,000     $12,300     $ 24,800              $ 57,100
male, working full
time, full year        Topcoded      $10,000       $10,000       $12,300        $ 17,530         $ 49,830
Hispanic female,       Reported      $0            $0            $25,000        $ 26,000         $ 51,000
working full time,
full year              Topcoded      $0            $0            $24,015        $ 24,015         $ 48,030
Nonblack, non-         Reported      $0            $20,000       $20,000        $ 20,000         $ 60,000
Hispanic male
working full time,     Topcoded      $0            $38,270       $38,270        $ 38,270         $114,810
part year
Nonblack, female,      Reported      $0            $0            $0             $160,000         $160,000
not full year          Topcoded      $0            $0            $0             $ 49,450         $ 49,450


Topcoding Prior to the 1996 Panel
Prior to the 1996 Panel, the data dictionary indicates a topcode of $33,332 for monthly income;
that is also the income topcode for the wave. That topcode is, therefore, rarely used for a single
month. In most cases, the monthly income is topcoded at $8,333 (one-fourth of $33,332), which
actually represents $8,333 or more. Individual amounts above $8,333 may occasionally be
shown if the respondent’s income varied considerably from month to month. For example, if a
respondent’s income from a single job was concentrated in only 1 of the 4 reference months,
SIPP could show a figure as high as $33,332.

Summary income variables on the person, family, and household records are simply the sums of
the component variables after they have been topcoded. The summary variables are not
independently topcoded. Thus, a person with high income from several sources (multiple jobs,
businesses, property) could have aggregate monthly income well over the topcode for each
source and yet SIPP could still be greatly understating the person’s true income.

As shown in Table 10-21, person 101 has wages topcoded. The person received considerably
more money in December than in the other months. In addition, total family income and total
household income are the sum of the income amounts (in this case, WS1AMT+S01AMT) after
they have been topcoded.


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                    10-35
SIPP USERS’ GUIDE

      Table 10-21. Example of Topcoding in the Core Wave Files Prior to the 1996 Panel:
                                 Single Person Household

     Person     Calendar         Household        Family Total    Topcoded         Social
     Number     Month            Total Income     Income          Wages            Security      Actual
     (PNUM)     (MONTH)          (HTOTINC)        (FTOTINC)       (WS1AMT)         (S01AMT)      Wages
     101        10                $9,333          $9,333          $8,333           $1,000        $ 8,333
     101        11                $9,333          $9,333          $8,333           $1,000        $ 8,333
     101        12                $9,333          $9,333          $8,333           $1,000        $12,123
     101        01                $9,583          $9,583          $8,333           $1,250        $ 9,456


Using Allocation (Imputation) Flags
As described in Chapter 4, the Census Bureau often imputes information when a person does not
respond to the survey or to a particular question.

1. Prior to the 1996 Panel, the whole record may have been imputed because the person refused
   to be interviewed (and no proxy interview was obtained) or because the person left the
   sample in the middle of the wave and no interview was conducted. If that happened, INTVW
   will be 3 or 4.17
2. A variable of interest may be imputed. In the core wave files prior to the 1996 Panel, there is
   an allocation (imputation) flag for almost all of the person-level variables. Beginning with
   the 1996 Panel, there is an allocation (imputation) flag associated with every variable subject
   to imputation. For example, AEDUCATE is the allocation (imputation) variable that
   identifies whether EEDUCATE is imputed.
For labor force items, the Census Bureau uses the following special imputation procedures when
a person has no current wave information indicating whether or not he or she worked during the
reference period.18 If the Census Bureau can infer from what it knows about the previous
reference period whether the person had a job or business at the start of the current period, the
Census Bureau carries out the following procedure:

1. If the person was working at the end of the prior wave, then labor force participation is
   imputed from a single donor for the complete current wave.
2. The Census Bureau then projects job characteristics for the person from the person’s prior
   wave through the current wave.


17
   For cases in the 1996 Panel for whom prior wave information did not exist for a person-level noninterview (such
as in Wave 1 or in Waves 2–12 when the person was new to the sample), the whole record may have been imputed.
To identify such cases, users need to check both person number (to distinguish wave of entry into the sample) and
EPPINTVW, which will be 3 or 4 for these cases.
18
   Chapter 4 contains a discussion of how analysts can determine whether these special imputation procedures were
used.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                     10-36
                                                                     USING THE CORE WAVE FILES

3. Finally, the Census Bureau edits the job characteristics for consistency with the imputed
   labor force participation variables.
This procedure is known as an EPPFLAG imputation, after the name of the variable that
indicates its use.

If a person was a nonworker in the prior wave or the Census Bureau cannot infer work status on
the basis of prior wave data, then the person’s work status is imputed. If the person is imputed as
a worker in the reference period, the Census Bureau imputes the complete set of job/business
characteristics variables and labor force participation variables to the person from one donor, in
order to maintain consistency among the fields. That procedure is called a “little Type Z”
imputation.

For some items in some cases, a direct logical or carryover imputation is made. The carryover
imputation takes the previous wave’s value for the item for the sample member and imputes it to
the current wave. That imputation is done particularly for items that rarely (or never) change for
a sample member across waves (such as sex and race) or for items that change in predictable
ways (such as age).

Variables are imputed and the allocation (imputation) flags are set before composite variables are
created. For example, if income is imputed for one member of a household, that person’s
allocation (imputation) flag is set. However, total household income is computed after that
imputation; if any household member had any income imputed, then total household income is
based, in part, on imputed information. There is no direct indication on the records of other
household members that any information has been imputed.

Because the edit and imputation procedures used in the core wave files and in the full panel
longitudinal research files are different, data from the two sources will not always agree. See
Chapter 4 for a more detailed discussion of the SIPP edit and imputation procedures.


Using Weights
The core wave files include a number of alternative reference month weights for use in data
analysis. Table 10-22 includes examples of the weights for the 1996 and the 1990–1993 Panel
core wave files. The choice of the appropriate weight for a given analysis depends on the
population of interest for that analysis—person, household, family, or related subfamily.
Suggestions for which weights to use and how to use them are included in the source and
accuracy statements that accompany files ordered from the Census Bureau. Also, Chapter 8 of
the Guide contains a full discussion of how to use weights in the core wave files.


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                    10-37
SIPP USERS’ GUIDE

Table 10-22. Weight Variables in SIPP Core Wave Files for the 1996 and 1990–1993 Panels

Variable Name                                        Description
WPFINWGT (FNLWGT)                                    Reference month, final weight of person
WHFNWGT (HWGT0)                                      Reference month, final weight of household
WFFINWGT (FWGT)                                      Reference month, final weight of family
WSFINWGT (SWGT)                                      Reference month, final weight of related subfamily
WPFINWGT (P5WGT)a                                    Interview (5th) month, final weight of person
WHFNWGT (H5WGT)a                                     Interview (5th) month, final weight of household
a
  Beginning with the 1996 Panel, SIPP files no longer include the interview month weights.


Identifying States
For the 1996 Panel, the variable TFIPSST identifies 45 states and the District of Columbia. To
help protect the confidentiality of respondents, the Census Bureau combined the remaining five
states as follows:

1. Maine, Vermont; and
2. North Dakota, South Dakota, Wyoming.
The core wave files from panels prior to the 1996 Panel contain the variable HSTATE, which
identifies 41 individual states and the District of Columbia; the nine other states are combined
into three groups:

1. Maine, Vermont;
2. Iowa, North Dakota, South Dakota; and
3. Alaska, Idaho, Montana, Wyoming.
Even though it is possible to identify most states, the SIPP sample was not designed to be
representative at the state level and should not be used to produce direct state-level estimates.
The state variable is included on the public use files to allow examination of how state-level
characteristics affect national estimates. For example, a user could apply the state-specific
eligibility criteria for a means-tested program in order to arrive at a national estimate of the
number of people eligible for the program. Because some states are not uniquely identified, some
method of allocating the state-specific eligibility rules to sample persons in those states would
need to be devised.


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                     10-38
                                                                     USING THE CORE WAVE FILES


Identifying Metropolitan Areas
The core wave files include two variables useful in identifying metropolitan areas. The first
variable, TMETRO (HMETRO), identifies residences located in metropolitan areas. It can be
used to produce national estimates of the metropolitan population. However, it cannot be used to
produce estimates of the nonmetropolitan population. To protect respondent confidentiality, the
Census Bureau recoded and identified a small random sample of metropolitan households in the
public use files as nonmetropolitan. The remaining metropolitan sample should still produce
(approximately) unbiased estimates of the metropolitan population. However, the procedure
“contaminates” the nonmetropolitan sample, and estimates of nonmetropolitan characteristics
based on that sample will be biased (the magnitude of the bias depends on the specific analysis
being performed).

A second variable, TMSA (HMSA), identifies 93 MSAs (Metropolitan Statistical Areas) and
CMSAs (Consolidated Metropolitan Statistical Areas), as defined by the Office of Management
and Budget.


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses
following 1996 variable names.
                                                    10-39
11. Using Topical Module Files
This chapter discusses procedures for working with data from the topical module public use files
from the Survey of Income and Program Participation (SIPP). The chapter begins by describing
the documentation that accompanies the topical module public use files obtained from the
Census Bureau. The discussion then turns to the data files themselves. The data file structure is
described, and detailed explanations are provided about how to use the topical module files when
performing common tasks. Those tasks include:

!   Using the monthly interview status variables;
!   Identifying people, households, and families;
!   Using imputation flags; and
!   Identifying states and metropolitan areas.
Before reading this chapter, users should read Chapter 9, “The SIPP Public Use Files,” for an
introduction to Section II. Analysts using only one topical module file also should read about the
use of sample weights (Chapter 8) and the computation of standard errors (Chapter 7). Those
planning on merging data from a topical module to data from the core wave or full panel files
should also read Chapter 10 for information about the core wave files, Chapter 12 for
information about the full panel files, and Chapter 13 for information about linking SIPP public
use files.

This chapter focuses on the topical module files. It is written so that it can be used independently
of the chapters describing the core wave and full panel files. Although there are many similarities
across the three types of SIPP public use data files, important differences do exist. Because those
differences are sometimes subtle, users familiar with the core wave and full panel files should
read this chapter carefully, paying close attention to information about variable names and file
structures. Tables 9-2 and 9-3 summarize the differences between the core wave, topical module,
and full panel longitudinal research files.

For the 1996 Panel, most variable names changed from those used in previous panels. To aid
users working with files from panels prior to 1996, this chapter presents both the old and the new
variable names when the text applies to both 1996 and pre-1996 panel files. In the main body of
the text, the old names are presented in parentheses following the new names. For example, the
sample unit ID variable name, which is SSUID in the 1996 Panel, was SUID in previous panels;
it is written in this chapter as SSUID (SUID). In tables, a variety of methods are used to present
both the old and the new names.


                                                 11-1
SIPP USERS’ GUIDE


Using the Technical Documentation of the
Topical Module Files
Each data file received from the Census Bureau comes with a set of technical documentation and
a data dictionary. The technical documentation includes:

!   The item booklets (for the 1996 Panel);
!   The paper survey instrument (for panels prior to 1996);
!   A glossary of selected terms;
!   A cross-walk, mapping reference months into calendar months for each rotation group;
!   A source and accuracy statement describing the sample weights and the computation of
    standard errors; and
!   User Notes.
The survey instrument is vital to understanding what questions were asked, how they were asked,
the order in which they were asked, to whom they were asked, and the way in which the answers
were recorded. Some questions employ skip patterns (Chapter 3), so users should pay particular
attention to which questions were skipped for which respondents. The skip patterns are best
understood by consulting the survey instruments. With the introduction of computer-assisted
interviewing (CAI) in the 1996 Panel, questionnaire documentation is now available from the
SIPP Web site (http://www.sipp.census.gov/sipp/).

The source and accuracy statements provide information about the weights on the files, when
and how to make adjustments to the weights, and one approach to computing standard errors for
some common types of estimates. More detailed discussions of those topics are provided in
Chapters 7 and 8 of this Guide.

The data dictionary provides a detailed description of each variable on the file. It describes four
aspects of each variable:

1. The definition,
2. The sample universe of the corresponding survey question,
3. The ranges for all legal values, and
4. The location (and size) in the file.
A machine-readable version of the data dictionary accompanies each data file. It can also be
downloaded from the Internet (http://www.sipp.census.gov/sipp/).


                                              11-2
                                                             USING TOPICAL MODULE FILES

The data dictionary is formatted to facilitate processing by user-written computer programs. The
upper panel of Figure 11-1 shows an excerpt from the data dictionary for the topical module
from Wave 1 of the 1996 Panel. A “D” in the first column signifies that the next few lines define
the variable: (1) the variable name; (2) the size (i.e., how many digits it contains); (3) the starting
position; and (4) the definition. Lines beginning with a “T”, added with the 1996 Panel, contain
short variable descriptions that can be used by many software packages as variable labels.

           Figure 11-1. Excerpt from the Data Dictionary for the Topical Module Files
                                 Wave 1 of the 1996 SIPP Panel

                                    Wave 1 of the 1996 SIPP Panel

D EENTAID   3   45
T PE: Address ID of hhld where person entered Sample
    Address ID of the household that this person belonged to at the time this
    person first became part of the sample. Address ID in a specific wave should
    never be greater than (WAVE * 10 + 9).
U All persons
V   11:129 .Entry address ID

D EPPPNUM      4   48
T PE: Person number
    Person number. This field differentiates persons within the sample unit.
    Person number is unique within the sample unit across all waves of a panel.
    Person number for a specific wave should never be greater than
    (WAVE * 100 + 99).
    U All persons
V 101:1299 .Person number

D EPOPSTAT 1    52
T PE: Population status based on age in fourth ref. Month
    Population status. This field identifies whether or not a person was
    eligible to be asked a full set of questions, based on his/her age in
    the fourth month of the reference period.
U All persons
V     1 .Adult (15 years of age or older)
V     2 .Child (Under 15 years of age)

D EPPINTVW 2    53
T PE: Person’s interview status at time of interview
U All persons
V      1   .Interview (self)
V      2   .Interview (proxy)
V      3   .Noninterview - Type Z
V      4   .Nonintrvw - pseudo Type Z. Left sample during the reference
V      5   .Children under 15 during reference period

                                                                                       (figure continues)


                                                11-3
SIPP USERS’ GUIDE

    Figure 11-1. Excerpt from the Data Dictionary for the Topical Module Files (continued)

                                        Wave 3 of the 1993 SIPP Panel

D ENTRY    2   30
 Entry address ID
    Address of the household that person belonged to at the time person
    first became part of the sample
U All persons, including children

D PNUM    3    32
 Person number
U All persons, including children

D FILLER      3     35
 Filler

D FINALWGT 9    38
 Person weight (interview month)
    There are four implied decimal places.
U All persons, including children


A “U” in the first column signifies that the next words describe the sample universe.1 A “V” in
the first column indicates that the next number and phrase describe one of the values of the
variable. A blank in the first column denotes either a variable description or other comment. A
period (.) before a word denotes the start of the value label.

Prior to the 1996 Panel, the dictionaries had a different format, shown in the second panel of
Figure 11-1. A “D” in the first column signifies that the next few lines define the variable: (1) the
variable name; (2) the size (i.e., how many digits it contains); (3) the starting position; and (4)
the definition. A “U” in the first column signifies that the next words describe the sample
universe.2 A “V” in the first column indicates that the next number and phrase describe one of
the values of the variable. An asterisk in the first column denotes a comment. A period (.) before
a word denotes the start of the value label.

Figure 11-2 shows sample SAS and FORTRAN syntax for reading the data described by the
codebook fragments in Figure 11-1. Additional SAS program code could be used to associate
value labels (a SAS “format”) with the INTVW variable.


1
  The universe definitions included in the data dictionaries prior to the 1996 Panel were not always accurate. Users
of pre-1996 SIPP Panels should check the skip patterns in the actual survey questionnaire to determine which subset
of respondents was asked each question.
2
  See footnote 1.


                                                      11-4
                                                        USING TOPICAL MODULE FILES

          Figure 11-2. Corresponding SAS and FORTRAN Syntax to Read Data
                               from Topical Module Files

                                  Wave 1 of the 1996 Panel
                                            SAS

Input
  @45  EENTAID 3.
     EPPPNUM 4.
     EPOPSTAT 1.
     EPPINTVW 2.
     ;
LABEL EENTAID = “Adrs ID where person entered sample”
   EPPPNUM = “Person number”
   EPOPSTAT = “Population status based on age in fourth”
   EPPINTVW = “Person’s interview status”
   ;
                                        FORTRAN

   READ(INFILE,1000) EENTAID EPPPNUM EPOPSTAT EPPINTVW
1000 FORMAT(T45,I3,I4,I1,I2)


                               Wave 3 of the 1993 SIPP Panel
                                            SAS

Input
  @30    ENTRY    2.
      PNUM     3.
  @38 FINALWGT 9.4
  ;
LABEL ENTRY = “Entry address ID’
    PNUM    = “Person number”
    FINALWGT = “Person weight (interview month)”
    ;


                                        FORTRAN
    READ(infile,1000) ENTRY, PNUM, INTVW
1000  FORMAT(T457,I2,I3,I1)


                                           11-5
SIPP USERS’ GUIDE


Relationship of the Topical Module Data Files to
the Survey Instrument
Each wave’s survey instrument includes one or more topical modules,3 as described in Chapter 3.
The questions in those modules are often asked after the core survey questions and can be found
toward the end of the survey instrument. The data from the topical modules are usually combined
into one topical module data file for each SIPP wave.

The topical module data dictionary does not replicate the survey instrument. Thus, analysts
should keep a few things in mind when using the data:

!   The variables on the data files do not correspond one-to-one with the questionnaire items—
    the variables are listed in a different order, some are not included in the public use files, and
    some are created from a combination of other variables;
!   The range of possible values of the variables on the data files does not always correspond
    one-to-one with the response categories shown on the survey instrument or in the data
    dictionary;
!   The variable name in the data dictionary may not readily indicate the variable’s content;
!   Prior to the 1996 Panel, some variable names were used in different topical module files for
    different variables. For example, in the 1990 Panel, TM8400 was used in the Wave 2 topical
    module for a variable that indicates whether the respondent completed 12th grade. The same
    variable name was used in the Wave 6 topical module to indicate whether the respondent was
    a parent of children under 21 years of age living in the respondent’s household.
!   The complexity of the skip patterns may not be apparent just by looking at the data
    dictionary. Many questions were administered only to the household reference person, or to
    adults (age 15 years or older), or to people 25 years or older, or to some other subset of
    survey respondents.4
To avoid potential problems and confusion, analysts should become familiar with the survey
instrument before using the data. When working with the data, refer to both the survey
instrument and the data dictionary.


3
  Prior to the 1992 Panel, there were no topical modules administered with the Wave 1 interview, although some
topical content was included in the Wave 1 core questionnaire for the purpose of obtaining historical information.
As of the 1992 Panel, Wave 1 has had topical modules.
4
  The universe definitions included in the data dictionaries prior to the 1996 Panel were not always accurate. Users
of pre-1996 SIPP panels should check the skip patterns in the actual survey questionnaire to determine which subset
of respondents was asked each question.


                                                      11-6
                                                                  USING TOPICAL MODULE FILES


Structure of the Topical Module Files
The topical module files for the 1996 Panel contain one record for each person who was in the
sample with a completed (or imputed) interview in the fourth month of the wave’s reference
period (the month immediately prior to the interview). This arrangement is similar to the person-
month format of the core wave files, but only records for month four are included in the topical
module files. Prior to the 1996 Panel, the topical module files contained one record for each
person who was interviewed or for whom an interview was attempted in that wave (Table 11-1
shows one record for each such person; compare with Table 10-1, which shows up to four
records per sample person in the core wave files).5

In general, each topical module file contains data for all of the topical module subject areas
administered during a particular wave.6 Each topical module file also contains selected
information from the SIPP core; thus, for some analyses, those files can be used independently
from the core wave and full panel data files. When more detailed information from the SIPP core
is needed, data from the topical modules must be merged with data from the core wave or full
panel files. Chapter 13 provides a detailed discussion of merging SIPP files.

                    Table 11-1. Example of the Topical Module File Structure

                                                1996 Panel
                                 Current               Entry
          Sample Unit ID         Address ID            Address ID              Person Number
          (SSUID)                (SHHADID)             (EENTAID)               (EPPPNUM)
           123456789123           021                   011                     0101
           123456789123           021                   011                     0102
           123456789123           021                   021                     0201
           123456789123           021                   021                     0202
                                            Panels Prior to 1996
                                 Current               Entry
          Sample Unit ID         Address ID            Address ID              Person Number
          (ID)                   (ADDID)               (ENTRY)                 (PNUM)
           123451000              21                    11                      101
           1234551000             21                    11                      102
           123451000              21                    21                      201
           123451000              21                    21                      202


5
  The variables shown—sample unit ID, current address ID, entry address ID, and person number—are discussed in
detail later in this chapter.
6
  Chapter 3 offers a detailed listing of the topical modules administered with each wave of each SIPP panel.


                                                    11-7
SIPP USERS’ GUIDE

The topical module file structure differs from that of the core wave files in the following ways:

!   For the 1996 Panel, the topical module files contain one record for each person who was a
    SIPP sample member during month four of the wave; the core wave files contain one record
    per person for each month the person is in the sample.
!   Prior to the 1996 Panel, the topical module files contain one record per person for each
    person present in a SIPP household at the time of the interview; the core wave files contain
    one record per person for each month the person was in the sample during the previous 4
    months.
!   Prior to the 1996 Panel, the topical module files include records for people whose entire
    household refused to be interviewed or left the sample;7 those people are excluded from the
    core wave files.
!   Prior to the 1996 Panel, the structure of the topical module files was roughly similar to that of
    the full panel files, containing one record per person.


Reference Periods and Samples
Sample definitions and reference periods in the topical modules vary across panels, across
topical modules within panels, and even within topical modules. Users should pay careful
attention to those details in the topical module files they are using.

In the 1996 Panel, most topical module questions were asked only of people who were in the
SIPP sample during the fourth month of the wave’s reference period. People who were members
of SIPP households at the time of the interview (month five) but who were not members of SIPP
households during the previous month were not asked the topical module questions in the 1996
Panel. In the 1996 Panel, many of the questions refer to just that month (month four). However,
some topical module questions, and in some cases entire topical modules, refer to longer periods
of time, such as the previous 4 months, the previous year, or, in the various history topical
modules administered with Wave 1, the person’s life before SIPP.

Prior to the 1996 Panel, most topical module questions were asked of people who were in the
SIPP sample at the time of the interview (month five). This included people who were household
members at the time of the interview but who were not members of SIPP households at any time
during the previous 4 months, the reference period for SIPP core questions in that wave.8 Many
questions asked about “current” (month five) conditions, although some asked about longer
periods in the past.


7
  7 Panels that included topical modules in Wave 1, such as the 1993 and 1996 Panels, exclude those people from
the Wave 1 topical module files.
8
  This has important implications for procedures used to merge the topical modules to data from the core. Core data
that correspond to the same reference month as a topical module must often be merged from the subsequent wave
rather than from the same wave as the topical module, as discussed in Chapter 13.


                                                      11-8
                                                                      USING TOPICAL MODULE FILES


Using a Person’s Monthly Interview Status
Variables
A person’s monthly interview status variable is used to determine whether the data for that
person in a given month should be used. Some analysts refer to it as the in sample variable to
distinguish it from the household interview status variable, EOUTCOME (ITEM36B), and
another variable that indicates the type of interview or noninterview for the person, EPPINTVW
(INTVW). The interview status variable has three possible values: 0, 1, and 2. A value of 1
indicates that the person was both in-scope for the survey (a member of the population that the
SIPP sample is intended to represent) and, aside from some item nonresponse, provided
complete answers to the SIPP core questions for the reference month in question.9


Monthly Interview Status in the Topical Module Files
from the 1996 Panel

There is only one interview status variable in the topical module files from the 1996 Panel. That
variable, EPPMIS4, identifies a person’s status in the fourth reference month of the wave.
Because the topical module files from the 1996 Panel contain records only for people for whom
this variable is equal to 1 (and so equals 1 on all records in the file), EPPMIS4 can be safely
ignored when working with topical module files from the 1996 Panel.


Monthly Interview Status in the Topical Module Files
from Panels Prior to 1996

The topical module files for panels prior to 1996 are different. On those files, a person’s
interview status variable is labeled PP-MIS1, PP-MIS2, PP-MIS3, PP-MIS4, and PP-MIS5.
These variables refer to the four reference months of the wave (PP-MIS1 to PP-MIS4) and the
interview month itself (PP-MIS5).

The monthly interview status is the only reliable guide to whether the data for a given person
should be used in a given month. Analysts should use data for only those months in which a
person’s interview status (PP-MIS) is equal to 1.10
9
   The only exception is for Type Z noninterviews. For Type Z noninterviews prior to the 1996 Panel, complete
records for the SIPP core were imputed and the monthly interview status variable was set to 1, indicating that, for
most analytic purposes, the responses should be treated as though they were provided by the respondent. This
exception is handled similarly in the 1996 Panel when there is no prior wave information. When prior wave
information exists, items are imputed using the same hot-deck methods applied to instances of item nonresponse.
10
   As a safeguard against inadvertently using data for months when PP-MIS is not equal to 1, all monthly variables
in the user’s data extract should be set to a missing value for months when PP-MIS is not equal to 1. Most statistical
packages allow certain values to be flagged as missing. Once flagged, those values are excluded from computations.


                                                       11-9
SIPP USERS’ GUIDE

Any data present for months when a person’s interview status is coded either 0 or 2 should be
ignored. A code of 0 indicates that the person was not in the sample that month, and a code of 2
indicates a noninterview for that month.

On the topical module files for panels prior to 1996, the topical module questions were asked
only of sample members with PP-MIS5 equal to 1:11 that is, the topical module questions were
asked only of those who were in the SIPP sample at the time of the interview. Because the
reference periods of the topical module questions vary, some topical module questions contain
information about people who had been secondary sample members during previous months,
even though they were no longer part of the SIPP sample at the time of the interview. The
variables PP-MIS1 to PP-MIS4 are useful when working with topical module questions that refer
to previous months. The four variables are also useful when merging topical module data with
data from the core, a topic discussed in Chapter 13.

Four sample members are shown in Table 11-2. Two were present in the interview month (PP-
MIS5 = 1), and two were not present (PP-MIS5 = 2). Analysts interested in just the interview
month should use data only for people with PP-MIS5 = 1. In this example, only persons 101 and
201 would be included.

        Table 11-2. Monthly Interview Status Variables in the 1984-1993 SIPP Panels

     Sample            Current        Entry            Person       Rotation                    PP-MIS
     Unit ID           Address ID     Address ID       Number       Group
     (ID)              (ADDID)        (ENTRY)          (PNUM)       (ROTATION)         1    2     3   4   5
      123451000          11             11              101          1                 1    1     1   1   1
      123451000          11             11              102          1                 1    1     2   2   2
      123451000          11             11              201          1                 2    2     2   2   1
      123451000          11             11              202          1                 0    0     2   2   2

If the research focuses on January, analysts should use data only for people with PP-MISx = 1,
where x corresponds to the reference month that contains information about January (which
varies by wave and rotation group). Assuming an analyst is interested in January 1994, the
example represents Wave 4 and rotation group 1 of the 1993 Panel (see Table 11-3 for the
reference months); the analyst would use only the people with PP-MIS1 = 1. Thus, only persons
101 and 102 would be included.

        Table 11-3. Interview Month and Reference Months for Each Rotation Group
                                in Wave 4 of the 1993 Panel

       Rotation Group       Reference Months for Core Questions                    Interview Month
       2                    Oct., Nov., Dec. 1993; Jan. 1994                       Feb. 1994
       3                    Nov., Dec. 1993; Jan., Feb. 1994                       Mar. 1994
       4                    Dec. 1993; Jan., Feb., Mar. 1994                       Apr. 1994
       1                    Jan., Feb., Mar., Apr. 1994                            May 1994

11
  In some cases, questions are asked of all household members over 14 years old. In other cases, they may be asked
only of the household reference person. There are also topical modules in which other subsets of household
members are interviewed.


                                                     11-10
                                                             USING TOPICAL MODULE FILES

As demonstrated by this example, the topical module files for panels conducted before 1996
contain a record for each person for whom no interview data were collected, either because the
person refused to be interviewed (and no proxy interview was obtained) or because the person
left the survey sample (e.g., died or entered the Armed Forces or an institution). Those
individuals have PP-MIS5 = 2 and PP-MISj = 1 for j = 1, 2, 3, or 4 or INTVW = 3 or 4. Their
demographic information was gathered from the previous time that they were successfully
interviewed; if they have topical module information, it was completely imputed by the Census
Bureau.


Comparison of Variables in the Topical Module
and Core Wave Files
The topical module files contain a number of variables that are also present in the core wave
files. These include variables needed to identify the household and the person. Also included are
selected background (demographic) characteristics. In the 1996 Panel, the values for the
background characteristics correspond to the month-four values in the core wave file for the
same wave for the 1996 Panel. Variables common to the core wave and topical module files are
generally given the same names in both files. For example, SSUID is used for the sample unit
identifier, SHHADID is the current address ID, and EPPPNUM is the person number on both
files.12 Among the background variables, TAGE is used on both files for the respondent’s age,
and EMS is used for the respondent’s marital status. Table 11-4 shows the 27 variables that are
common to the core wave file and topical module file from Wave 1 of the 1996 Panel.

Prior to the 1996 Panel, the demographic data on the topical module files corresponded to the
interview month (month five), not to any of the 4 reference months for the core interview. For
that reason, the information in variables such as AGE, RRP, and MS (the respondent’s age,
relationship to the household reference person, and marital status) could differ from the core
wave file variables of the same names for the wave in which the topical module was
administered. This would indicate that a change occurred between the last month of the reference
period (month four) and the interview month (month five). Some variables included on both the
core wave and topical module files have different names. As shown in Table 11-5, sample unit
ID, rotation group, state, interview status in month five, and the person-level weight are
contained in both files but have different variable names.


12
 Use of common names facilitates merging of the core wave and topical module files from the 1996 Panel.
Merging files is discussed extensively in Chapter 13.


                                                11-11
SIPP USERS’ GUIDE

                Table 11-4. Variables Common to the Core Wave and Topical
                        Module Files from Wave 1 of the 1996 Panel

             Variable
             Name                       Description
             EEDUCATE                   Highest degree received or grade
             EENTAID                    Address ID of household where person entered
             EMS                        Marital status
             EORIGIN                    Origin of this person
             EOUTCOME                   Interview status code for this household
             EPNDAD                     Person number of father
             EPNGUARD                   Person number of guardian
             EPNMOM                     Person number of mother
             EPNSPOUS                   Person number of spouse
             EPOPSTAT                   Population status based on age
             EPPINTVW                   Person’s interview status
             EPPPNUM                    Person number
             ERACE                      Race of this person
             ERRP                       Household relationship
             ESEX                       Gender of this person
             RDESGPNT                   Designated parent or guardian flag
             RFID                       Family ID number for this month
             RFID2                      Family ID excluding related subfamily
             SHHADID                    Household address ID—differentiates households
             SPANEL                     Sample code—indicates panel year
             SROTATON                   Rotation of data collection
             SSUID                      Sample unit identifier
             SSUSEQ                     Sequence number of sample unit — primary
             SWAVE                      Wave of data collection
             TAGE                       Age as of last birthday
             TFIPSST                    FIPS state code
             WPFINWGT                   Person weight


           Table 11-5. Examples of Same Variables with Different Names in the
              Core Wave and Topical Module Files Prior to the 1996 Panel

                                                       Variable Name in the     Variable Name in the
   Description                                         Core Wave File           Topical Module File
   Sample unit ID                                      SUID                     ID
   Rotation group                                      ROT                      ROTATION
   State of residence                                  HSTATE                   STATE
   Monthly interview status in the interview month     MIS5                     PP-MIS5
   Person-level weight in the interview month          P5WGT                    FINALWGT


                                                     11-12
                                                                      USING TOPICAL MODULE FILES


Identifying People
There are many occasions when it is necessary to identify which records belong to each
individual in the SIPP data files. This need arises, for example, when

!    Merging data from topical module files to data from the core wave or full panel files,
!    Merging data from two or more topical module data files,
!    Linking husbands and wives, and
!    Linking parents and children.
In the 1996 Panel, two variables are needed to uniquely identify a person: the sample unit ID and
the person number.13 For files from panels prior to 1996, three variables are needed to uniquely
identify a person: the sample unit ID, entry address ID, and person number. Table 11-6 shows
the variable names used in the topical module files for the 1996 Panel and for the pre-1996
Panels.

                  Table 11-6. Variables Used to Uniquely Identify a Person in the
                                       Topical Module Files

      Variable Name                                  Description
      SSUID (ID)                                     Sample unit ID
      EENTAID (ENTRY)                                Entry address ID (not needed in the 1996 panel)
      EPPPNUM (PNUM)                                 Person number


The variables can be described as follows:

!    SSUID (ID) uniquely identifies each initially sampled dwelling unit.14 Every person in a core
     wave file was either a member of one of those units (an original sample member) or lives
     with someone who was a member of an initially sampled dwelling unit. A person’s
     connection to that unit is an attribute of that person and does not change over time.15 This
     means that as people move from address to address, their SSUID (ID) stays the same. As new
     people join the homes of original sample members, they receive the SSUID (ID) of the
     original sample members.


13
   Users should note that in the 1996 Panel, the entry address ID is no longer needed for unique identification. Its
continued use will not create any problems; it is simply redundant information. That is a change from earlier panels,
in which the entry address ID was key to uniquely identifying a person.
14
   The SSUID (ID) is a random recode of three other variables in the Census Bureau’s internal (not public use) files:
the respondent’s sampling area (primary sampling unit), the cluster of housing units within that area (called the
“segment”), and a sequentially assigned serial number. Those three variables are omitted from the public use files to
protect the confidentiality of the respondents.
15
   There is one rare exception to this rule for panels prior to 1996, which is described in the section entitled
“Identifying Movers” later in this chapter.


                                                      11-13
SIPP USERS’ GUIDE

!    EENTAID (ENTRY) identifies the address where the person lived at the time he or she was
     first interviewed. It does not change even if the person moves.16 Prior to the 1996 Panel, it
     was used in conjunction with the person number and the sample unit ID to uniquely identify
     people within the sampling unit. It is not needed to uniquely identify people in the 1996
     Panel. Values for this variable are unique only within sample units. The entry address ID has
     two components. The first part of the ID number (two digits in the 1992 and 1996 Panels,
     and one digit in all others) identifies the wave in which SIPP interviews were first conducted
     at the address. The second part of the number (one digit in all panels) sequentially numbers
     addresses within a sample unit [SSUID (ID)] that enter the sample in the same wave. See
     Chapter 10 for a more complete discussion.
!    Prior to the 1996 Panel, PNUM uniquely identified a person within the sample unit and entry
     address ID. In the 1996 Panel, EPPPNUM uniquely identifies a person within the sample
     unit. EPPPNUM (PNUM) does not change even if the person moves.17 The first part of
     EPPPNUM (PNUM) (two digits in the 1992 and 1996 Panels, and one digit in all others)
     indicates the wave in which the person was first interviewed.18 The remaining two digits are
     sequentially assigned within the household. Thus, original sample members are assigned
     person numbers ranging from 100 to 199. Individuals who enter the SIPP sample in Wave 2
     are assigned a person number ranging from 200 to 299. Those who enter in Wave 10 are
     assigned person numbers ranging from 1001 to 1099.
Table 11-7 illustrates how the combination of SSUID (ID), EENTAID (ENTRY), and
EPPPNUM (PNUM) uniquely identifies people and provides information about when they first
entered the SIPP sample. In this example, there are eight individuals: five are original sample
members, one person joined the SIPP sample in Wave 4, one person joined in Wave 7, and one
person joined in Wave 10.

To uniquely identify a household or group quarters in the topical module files, analysts should
use the two variables shown in Table 11-8.

People with the same SSUID (ID) (sample unit ID) and SHHADID (ADDID) (current address
ID) values live in the same household (or group quarters location) in the relevant month. For the
1996 Panel, household membership refers to month four of the wave’s reference period. For
panels prior to 1996, household membership refers to the interview month. The eight individuals
shown in Table 11-9 make up four households. The first household contains the first four
individuals. The second household contains one person. The third household contains one
person. The fourth household contains two people. (Users may find it helpful to refer to Figure
2-1 [pp. 2-10-2-14], which illustrates the concepts of household and changes in household.)


16
   16 See footnote 7.
17
   For cases in the 1996 Panel for whom prior wave information did not exist for a person-level noninterview (such
as in Wave 1 or in Waves 2–12 when the person was new to the sample), the whole record may have been imputed.
To identify such cases, users need to check both person number (to distinguish wave of entry into the sample) and
EPPINTVW, which will be 3 or 4 for these cases.
18
   Chapter 4 contains a discussion of how analysts can determine whether these special imputation procedures were
used.


                                                     11-14
                                                                     USING TOPICAL MODULE FILES

          Table 11-7. How to Uniquely Identify a Person in the Topical Module Files

                                                     1996 Panel
Sample            Entry              Person             Current
Unit ID           Address ID         Number             Address ID
(SSUID)           (EENTAID)          (EPPPNUM)          (SHHADID)        Notes
123456789123      011                0101               071              Original sample member
123456789123      011                0102               071              Original sample member
123456789123      011                0401               071              Enters SIPP sample in Wave 4
123456789123      071                0701               071              Enters SIPP sample in Wave 7
321456789123      011                0101               031              Original sample member
321456789123      011                0102               032              Original sample member
321456789123      011                0103               101              Original sample member
321456789123      101                1001               101              Enters SIPP sample in Wave 10
                                               Prior to the 1996 Panel
Sample             Entry             Person             Current
Unit ID            Address ID        Number             Address ID
(ID)               (ENTRY)           (PNUM)             (ADDID)          Notes
123456789            11               101                 71             Original sample member
123456789            11               102                 71             Original sample member
123456789            11               401                 71             Enters SIPP sample in Wave 4
123456789            71               701                 71             Enters SIPP sample in Wave 7
321456789            11               101                 31             Original sample member
321456789            11               102                 32             Original sample member
321456789            11               103                101             Original sample member
321456789           101              1001                101             Enters SIPP sample in Wave 10 (1992 Panel)
a
  Not needed to uniquely identify a person in the 1996 Panel.


                 Table 11-8. Variables Used to Uniquely Identify a Household or
                          Group Quarters in the Topical Module Files

              Variable Name                            Description
              SSUID (ID)                               Sample unit ID
              SHHADID (ADDID)                          Current address ID in month 4 (in month 5)


                                                      11-15
SIPP USERS’ GUIDE

       Table 11-9. How to Uniquely Identify a Household in the Topical Module Files

                                            1996 Panel
     Sample Unit ID      Current Address Person Number
     (SSUID)             ID (SHHADID)      (EPPPNUM)           Notes
     123456789123           071             0101               Four people in this household
     123456789123           071             0102
     123456789123           071             0401
     123456789123           071             0701
     321456789123           031             0101               One person in this household
     321456789123           032             0102               One person in this household
     321456789123           101             0103               Two people in this household
     321456789123           101             1001
                                        Panels Prior to 1996
     Sample Unit ID      Current Address Person Number
     (ID)                ID (ADDID)        (PNUM)              Notes
       123456789              71             101               Four people in this household
       123456789              71             102
       123456789              71             401
       123456789              71             701
       321456789              31             101               One person in this household
       321456789              32             102               One person in this household
       321456789            101              103               Two people in this household
       321456789            101             1001


Identifying Families
The term family, as used in Census Bureau publications, refers to a group of two or more people
related by birth, marriage, or adoption who reside together; all such individuals are considered
members of one family.

The Census Bureau distinguishes among several types of families:

!   A primary family is a family containing the household reference person and all of his or her
    relatives. This means that a household composed of a husband and wife, their son, and their
    son’s wife (i.e., the daughter-in-law) is classified as a primary family containing four people.
!   A related subfamily is a nuclear family that is related to but does not include the household
    reference person. For example, the son and his wife (i.e., the daughter-in-law) in the
    preceding example are a related subfamily.
!   An unrelated subfamily (sometimes called a secondary family) is a nuclear family that is not
    related to the household reference person. Thus, a husband and wife who live in a friend’s
    house are classified as an unrelated subfamily. A mother and daughter who live in the
    mother’s boyfriend’s apartment are classified as an unrelated subfamily.


                                              11-16
                                                                     USING TOPICAL MODULE FILES

!    A primary individual is a household reference person who lives alone or lives with only
     nonrelatives. Primary individuals are sometimes treated by the Census Bureau as families of
     only one person and are referred to as pseudo-families.
!    A secondary individual is not a household reference person and is not related to any other
     people in the household. Secondary individuals are sometimes treated by the Census Bureau
     as families of only one person and are referred to as pseudo-families.
In the topical module files for the 1996 Panel, the variables shown in Table 11-10 can be used to
uniquely identify families.

                Table 11-10. Variables Used to Uniquely Identify a Family in the
                            Topical Module Files for the 1996 Panel

     Variable Name                 Description
     SSUID                         Sample unit ID
     SHHADID                       Current address ID
     and one of the following:
     RFID                           Family ID in month four of the wave
     RFID2                          Family ID in month four (excluding related subfamily members; RFID2=0
                                    for related subfamily members)

The Census Bureau has two principal methods for distinguishing families that are based on the
variables and numbering schemes shown in Table 11-10. Analysts must remember to choose
which type of family classification they want and then use the appropriate method.

!    The first method defines a family as all persons who are related and living together. The
     family ID variable RFID is used with this definition. RFID groups the household reference
     person with all related household members by assigning them the same ID number. This
     family group corresponds to the Census Bureau’s definition of primary family. RFID groups
     members of each unrelated subfamily (and primary and secondary individuals) separately.
!    The second method is similar to the first in defining a family, but the family excludes related
     subfamilies. The family ID variable RFID2 is used with this definition. RFID2 equals zero
     for related subfamilies. RFID2 groups members of each unrelated subfamily (and primary
     and secondary individuals) in the same way as RFID—each group has a unique number.19
Table 11-11 illustrates the difference between the RFID and RFID2 variables. Those variables
refer to month four of the wave’s reference period. For example, a mother, a father, and a child
would be family 1 (RFID = 1). The first household in the table contains a primary family of five
people. The primary family contains members of related subfamilies. However, the topical


19
  The variables included on the topical module files do not allow analysts to distinguish among different related
subfamilies living in the same household. If needed, the RSID variable (which groups each related and unrelated
subfamily separately) can be merged from the core wave files. Chapter 10 discusses the core wave files, and Chapter
13 discusses the merging of multiple SIPP files.


                                                     11-17
SIPP USERS’ GUIDE

 Table 11-11. Uniquely Identifying Families in the Topical Module Files in the 1996 Panel

                           Family ID,   Family ID,
                           Including    Excluding
Sample          Current    Related      Related      Person
Unit ID         Address ID Subfamily    Subfamily    Number
(SSUID)         (SHHADID) (RFID)        (RFID2)      (EPPPNUM)   Notes
 110011111123    11         1            1            0101       This household contains a primary
 110011111123    11         1            0            0102       family of five people. The primary
 110011111123    11         1            0            0103       family contains one or more related
 110011111123    11         1            0            0104       subfamilies.
 110011111123    11         1            0            0105

110077777723    11          1           1            0101        Three households formed by people
110077777723    21          1           1            0102        who were originally members of the
110077777723    21          1           1            0103        same originally sampled household
110077777723    22          1           1            0104        (SSUID of 110077777723). Two
110077777723    22          1           1            0105        subfamilies split off from the original
                                                                 household to become two new primary
                                                                 families at addresses 21 and 22.
122210000123    11          1           1            0101        This household contains a primary
122210000123    11          1           1            0104        family and two unrelated subfamilies.
122210000123    11          2           2            0305
122210000123    11          2           2            0306
122210000123    11          3           3            0307
122210000123    11          3           3            0308
555555555123    21          1           1            0101        This household contains a primary
555555555123    21          2           2            0201        individual and an unrelated subfamily.
555555555123    21          2           2            0202
555555555123    21          2           2            0203
610000000123    32          1           1            0101        Primary individual.
897454644123    11          1           1            0101        Group quarters with two secondary
897454644123    11          2           2            0102        individuals.


module files for the 1996 Panel do not contain the variables needed to determine whether all
subfamily members are members of the same subfamily. To determine that, an analyst would
need to merge the RSID variable from the month four records in the core wave file.

The second “household” is actually three households, each containing a primary family, that
originally formed one household. The third household contains a primary family and two
unrelated subfamilies. The fourth household contains a primary family and two unrelated
subfamilies. The fifth household contains a primary individual and an unrelated subfamily. The
fifth household contains only a primary individual. The sixth household is a group quarters
containing two people.


                                              11-18
                                                                     USING TOPICAL MODULE FILES


Other Variables Describing Household and
Family Composition
The topical module files contain several additional variables from the SIPP core that describe
household and family composition.20 The household composition variables included in the
topical module files from the 1996 Panel and from panels prior to 1996 are shown in Table
11-12. Additional variables from the core wave files and the full panel files can be merged with
data from the topical module files when added detail is needed (Chapters 10, 12, and 13).

                Table 11-12. Household and Family Composition Variables in the
                                     Topical Module Files

                                                    1996 Panel
           Variable Name                 Description
           ERRP                          Relationship to household reference person in month four
           EMS                           Marital status in month four
           EPNMOM                        Person number of mother in month four
           EPNDAD                        Person number of father in month four
           EPNGUARD                      Person number of guardian in month four
           EPNSPOUS                      Person number of spouse in month four
           RDESGPNT                      Designated parent or guardian in month four
                                               Panels Prior to 1996
            RRP                          Revised relationship to the household reference person (living
                                         with relatives, child of household reference person, etc.)
            PNSP                         Person number of spouse
            PNPT                         Person number of parent


Using the Relationship to Reference Person
[ERRP (RRP)] Variable

As Table 11-13 shows, ERRP (RRP) provides a summary description of how each individual is
related to the household reference person.21


20
   Detailed information about the relationships between members is collected in the Household Relationships topical
module. For the 1996 Panel, those data provide extensive information about household composition during month
four of the wave’s reference period. For earlier panels, the topical module provides information about household
composition at the time of the interview.
21
   Prior to the 1996 Panel, the RRPU variable, available in the core wave files, provides additional detail not
contained in the RRP variable. When needed, RRPU can be merged to data from the topical module files (Chapters
10 and 13).


                                                     11-19
SIPP USERS’ GUIDE

 Table 11-13. Relationship to the Household Reference Person in the Topical Module Files

                                                     1996 Panel
ERRP                      Description
 1                        Reference person w/related people in household
 2                        Reference person w/out related people in household
 3                        Spouse of reference person
 4                        Child of reference person
 5                        Grandchild of reference person
 6                        Parent of reference person
 7                        Brother or sister of reference person
 8                        Other relative of reference person
 9                        Foster child of reference person
10                        Unmarried partner of reference person
11                        Housemate or roommate
12                        Roomer or boarder
13                        Other nonrelative of reference person
                                                Panels Prior to 1996
Revised Relationship to
the Household
Reference Person (RRP)    Description
 1                        Household reference person, living with relatives
 2                        Household reference person, living alone or with nonrelatives
 3                        Spouse of household reference person
 4                        Child of household reference person
 5                        Other relative of household reference person
 6                        Nonrelative of household reference person, but related to other members of the household
 7                        Nonrelative of all members of the household


The ERRP (RRP) variable contains summary information about each person’s relationship to the
household reference person. Analysts should bear in mind that the household description
depends upon the identity of the household reference person. For example, the household in
Table 11-14 contains a mother, her daughter, and her daughter’s son. If the mother is the
household reference person [ERRP = 1 (RRP = 1)], her daughter is listed as a child of the
household reference person [ERRP = 4 (RRP = 4)] and the daughter’s son is listed as a
grandchild of the reference person in the 1996 Panel (ERRP = 5), but as another relative of the
household reference person in earlier panels (RRP = 5, but the same value has a different
meaning from that of the 1996 Panel variable). If the daughter is the reference person, her son is
listed as a child of the household reference person (RRP = 4) and her mother is listed as the
parent of the reference person in the 1996 Panel (ERRP = 6), but as another relative of the
household reference person in earlier panels (RRP = 5).22 Users should note that the identity of
the household reference person can change from one month to the next; thus, the household
description could also change.

22
  Because it is impossible to anticipate all of the different living arrangements found in SIPP sample households,
and in some cases more than one rule for identifying a reference person may apply, some interviewer discretion in
identifying the reference person is inevitable. For that reason, the resulting choices can sometimes appear somewhat
arbitrary to the analyst.


                                                      11-20
                                                                 USING TOPICAL MODULE FILES

 Table 11-14. ERRP (RRP) Coding for the Same Three-Generation Household When Two
   Different People Are Designated as the Reference Person in the Topical Module Files

  Designated     Relationship to the
  Reference      Household Reference
  Person         Person [ERRP (RRP)] Meaning of ERRP (RRP) Value
  Mother as Household Reference Person
  Mother         1 (1)                  Reference person (Reference person)
  Daughter       4 (4)                  Child of reference person (Child of reference person)
  Daughter’s son 5 (5)                  Grandchild of reference person (Other relative of reference person)
  Daughter as Household Reference Person
  Mother         6 (5)                  Parent of reference person (Other relative of reference person)
  Daughter       1 (1)                  Reference person (Reference person)
  Daughter’s son 4 (4)                  Child of reference person (Child of reference person)


Identifying a Person’s Spouse, Parent, or Guardian

Four other variables on the topical module files from the 1996 Panel can be used to describe
household and family composition. They are EPNSPOUS, EPNDAD or EPNMOM, and
EPNGUARD. These variables identify the person number of the spouse, the father or mother
(just one parent is identified in files from panels prior to 1996), and guardian of the person,
respectively. On the topical module files from panels prior to 1996, only two variables are found:
PNPT and PNSP, the person numbers of the person’s parent and spouse, respectively. In each
case, the relative is identified only if she or he is living at the same address as the person.

By building from these variables, the analyst can identify a variety of family configurations. For
example, these variables can be used to identify households containing three generations. Table
11-15 displays one household containing a mother and her two children. One child, EPPPNUM
= 0102 (PNUM = 102), has a son; the other child, EPPPNUM = 0104 (PNUM = 104), has a
spouse.


More About Using the SIPP ID Variables:
Identifying Movers
Most of the SIPP topical modules collect information that pertains to a single month—generally
month four of the wave’s core reference period in the 1996 Panel, and month five (the interview
month) for prior panels. However, some topical modules collect information about longer
reference periods, most commonly either the previous 4 months (the same period as the core
questions but often not with monthly resolution), the year prior to the interview (e.g., some items
in the child and adult well-being topical modules), or the prior calendar year (e.g., the annual
income and retirement accounts topical module of the 1996 Panel). In instances such as these, it


                                                  11-21
SIPP USERS’ GUIDE

                Table 11-15. Identifying Households Containing Three Generations
                                    in the Topical Module Files

                                                  1996 Panel
                                           Recoded
                                           Relationship to
                             Person        Household
                             Number        Reference         Spouse            Parent
     Household Member        (EPPPNUM)     Person (ERRP) (EPNSPOUS)            (EPNMOM)       Notes
     Mother                  0101          1                 9999              9999           Mother
     Daughter #1             0102          4                 9999              0101           Child
     Daughter #1’s Son       0103          5                 9999              0102           Grandchild
     Daughter #2             0104          4                 0105              0101           Child
     Spouse of Daughter #2   0105          8                  0104             9999           Spouse of child
                                              Panels Prior to 1996
                                           Recoded
                                           Relationship to
                           Person          Household
                           Number          Reference          Spouse           Parent
 Household Member          (PNUM)          Person (RRP)       (PNSP)           (PNPT)         Notes
 Mother                    101             1                  999              999            Mother
 Daughter #1               102             4                  999              101            Child
 Daughter #1’s Son         103             5                  999              102            Grandchild
 Daughter #2               104             4                  105              101            Child
 Spouse of Daughter #2 105                 5                  104              999            Spouse of child
Note: Value of 999 or 9999 means not applicable.


is sometimes useful to know something about household composition during the reference period
of the topical module.23 This section of the Users’ Guide is primarily for users who need to know
how to access that kind of information. This section may also be helpful to those who wish to
gain a better understanding of the SIPP ID variables for other reasons.

When a person moves, the current address field, SHHADID (ADDID), changes. The SSUID
(ID), EENTAID (ENTRY), and EPPPNUM (PNUM) values remain the same. The first part (two
digits in the 1992 Panel and the 1996 Panel, one digit in all others) of SHHADID (ADDID)
indicates the wave in which a household is first interviewed at that new address. The remaining
digit sequentially numbers the households that split into two or more households, as a result of a
move to a different location by original sample members. Thus, new addresses in Wave 2 are
numbered 021 (21), 022 (22), and so on. New addresses in Wave 3 are numbered 031 (31), 032
(32), and so on.


23
  For example, a person who joined the SIPP sample in Wave 4 of the 1996 Panel could not have contributed to the
household income (at least not as a household member) of the prior calendar year.


                                                    11-22
                                                              USING TOPICAL MODULE FILES

Table 11-16 shows that persons 0101 (101) and 0102 (102) in the first household are original
sample members. Person 0401 (401) moved into the home of persons 0101 (101) and 0102 (102)
in Wave 4. In Wave 7, all three of them moved to a new location and were joined by person 0701
(701). In the second household, person 101 is an original sample member who moved to a new
location in Wave 3. In the third household, person 0102 (102) is also an original sample member
who used to live with persons 0101 (101) and 0103 (103) of the same sample unit ID, but moved
to a new location in Wave 3 [to a different location from person 0101 (101)]. In the fourth
household, person number 0103 (103) is an original sample member who used to live with
persons 0101 (101) and 0102 (102) of the same sample unit ID number. All but two people
moved from their original location [i.e., only two people have SHHADID (ADDID) equal to
EENTAID (ENTRY)].

                  Table 11-16. Identifying Movers in the Core Wave Files

                                            1996 Panel
 Sample          Current       Entry          Person
 Unit ID         Address ID    Address ID     Number
 (SSUID)         (SHHADID)     (EENTAID)      (EPPPNUM)         Notes
 123456789123    071           011            0101              Persons 0101 and 0102 are the original
 123456789123    071           011            0102              sample members. Person 0401 begins
 123456789123    071           011            0401              to live with them in Wave 4. All three
 123456789123    071           071            0701              people move in Wave 7 and person
                                                                0701 joins them.
 321456789123    031           011            0101              Person 0101 is an original sample
                                                                member who moved in Wave 3.
 321456789123    032           011            0102              Person 0102 is an original sample
                                                                member who moved in Wave 3 to a
                                                                different location from person 0101.
                                       Panels Prior to 1996
 Sample          Current       Entry          Person
 Unit ID         Address ID    Address ID     Number
 (SUID)          (ADDID)       (ENTRY)        (PNUM)            Notes
 123456789       71            11             101               Persons 101 and 102 are the original
 123456789       71            11             102               sample members. Person 401 begins to
 123456789       71            11             401               live with them in Wave 4. All three
 123456789       71            71             701               people move in Wave 7 and person 701
                                                                joins them.
 321456789       31            11             101               Person 101 is an original sample
                                                                member who moved in Wave 3.
 321456789       32            11             102               Person 102 is an original sample
                                                                member who moved in Wave 3 to a
                                                                different location from person 101.


                                             11-23
SIPP USERS’ GUIDE

The next example (Table 11-17) further illustrates how the ID system works as people move to
new addresses, additional people move in with them, and households split. (Users may also find
it helpful to review Figure 2-1 [pp. 2-10–2-14], which illustrates changes in household
composition.)

!    In Wave 1, there is a five-person household consisting of a husband, a wife, a daughter, a
     son, and a cousin. Since this is the first wave, the current address number is 011 (11),
     indicating address 1 of Wave 1, and the entry address number for each member of the
     household is the same as the current address number. Since they are assigned in Wave 1, the
     person numbers are in the 0100 (100) series and numbered sequentially, beginning with 0101
     (101).
!    During Wave 2, the son joins the Army, moves into the military barracks, and therefore
     leaves the SIPP sample. For the son’s record, person number 0104 (104), the person-month
     file will contain a Wave 1 record for him and a Wave 2 record containing information (either
     imputed or provided by proxy) on his characteristics in the months of Wave 2 that he was
     still in the sample. If he does not return to the sample during the remainder of the panel, there
     will be no records for him beyond Wave 2.
!    During Wave 3, the daughter marries and her husband moves into the household. The current
     address number where the mother, father, cousin, daughter, and son-in-law live remains the
     same since it is the same address. The son-in-law’s entry address number is 011 (11), since
     he first enters the SIPP sample at an address coded 011 (11). The person number for the son-
     in-law is in the 0300 (300) series [0301 (301)] since he joins the SIPP sample in Wave 3.
!    During Wave 4, the daughter and son-in-law move into a new house. Their current address
     number changes to 041 (41) to indicate that a new address has been established in Wave 4.
     Meanwhile, the cousin, who is over age 15, moves in with an uncle.24 The cousin’s current
     address number changes to 042 (42) (i.e., the second new household formed in the fourth
     wave from this sample unit). The assignment of address number 041 (41) to the daughter and
     042 (42) to the cousin is arbitrary—it could be the other way around. The uncle enters the
     SIPP sample and receives an address number of 042 (42) and an entry address number of 042
     (42). The uncle’s person number is in the 0400 (400) series [0401 (401)] because he joins the
     survey in Wave 4.
!    No changes in household composition are observed during Waves 5 through 9.
!    During Wave 10,25 the daughter and son-in-law have a baby. This new sample member is
     assigned the sample unit ID of the daughter and son-in-law. The newborn’s entry address is
     041 (41), since that is the current address ID of the daughter and son-in-law at the time of
     birth. The newborn’s person number is 1001, reflecting the fact that the newborn came into
     the SIPP sample in Wave 10. Meanwhile, the cousin moves to Europe and therefore leaves
     the SIPP sample. The uncle, even though he did not move to Europe with the cousin, also
     leaves the SIPP sample because he no longer resides with an original SIPP sample member.
     Their records are no longer listed.

24
   In the 1993 Panel, all original sample members were followed, regardless of age. In all other panels (including the
1996 Panel), only those aged 15 or older were followed when they moved to new addresses.
25
   Prior to the 1996 Panel, only the 1992 Panel had more than nine waves.


                                                       11-24
                                                           USING TOPICAL MODULE FILES

             Table 11-17. Example of Household Changes and Their Effects on the ID
                               Variables in the Core Wave Files

                                             1996 Panel
                                                Current Address ID Entry Address ID Person Number
Household Member         Sample Unit ID (SSUID) (SHHADID)          (EENTAID)        (EPPPNUM)
Wave 1
Father                    101111103123            011             011              0101
Mother                    101111103123            011             011              0102
Daughter                  101111103123            011             011              0103
Son                       101111103123            011             011              0104
Cousin                    101111103123            011             011              0105
Wave 2
Father                    101111103123            011             011              0101
Mother                    101111103123            011             011              0102
Daughter                  101111103123            011             011              0103
Son                       101111103123            011             011              0104
Cousin                    101111103123            011             011              0105
Wave 3
Father                    101111103123            011             011              0101
Mother                    101111101233            011             011              0102
Daughter                  101111103123            011             011              0103
Son-in-Law                101111103123            011             011              0301
Cousin                    101111103123            011             011              0105
Wave 4                    Parent’s Household
Father                    101111103123            011             011              0101
Mother                    101111103123            011             011              0102
                          Daughter’s Household
Daughter                  101111103123            041             011              0103
Son-in-Law                101111103123            041             011              0301
                          Cousin’s Household
Cousin                    101111103123            042             011              0105
Uncle                     101111103123            042             042              0401
Wave 10                   Parent’s Household
Father                    101111103123            011             011              0101
Mother                    101111103123            011             011              0102
                          Daughter’s Household
Daughter                  101111103123            101             011              0103
Son-in-Law                101111103123            101             011              0301
Newborn                   101111103123            101             041              1001
                                                                                    (table continues)


                                                 11-25
SIPP USERS’ GUIDE

           Table 11-17. Example of Household Changes and Their Effects on the ID
                         Variables in the Core Wave Files (continued)

                                             Prior to 1996 Panel
                                                     Current Address         Entry Address      Person Number
Household Member               Sample Unit ID (ID) ID (ADDID)                ID (ENTRY)         (PNUM)
Wave 1
Father                         101111103               11                     11                101
Mother                         101111103               11                     11                102
Daughter                       101111103               11                     11                103
Son                            101111103               11                     11                104
Cousin                         101111103               11                     11                105
Wave 2
Father                         101111103               11                     11                101
Mother                         101111103               11                     11                102
Daughter                       101111103               11                     11                103
Son                            101111103               11                     11                104
Cousin                         101111103               11                     11                105
Wave 3
Father                           101111103               11                       11             101
Mother                           101111103               11                       11             102
Daughter                         101111103               11                       11             103
Son-in-Law                       101111103               11                       11             301
Cousin                           101111103               11                       11             105
Wave 4                           Parent’s Household
Father                           101111103               11                       11             101
Mother                           101111103               11                       11             102
                                 Daughter’s Household
 Daughter                        101111103               41                       11             103
 Son-in-Law                      101111103               41                       11             301
                                 Cousin’s
 Cousin                          101111103               42                       11             105
 Uncle                           101111103               42                       42             401
 Wave 10a                        Parent’s Household
 Father                          101111103               11                       11             101
 Mother                          101111103               11                       11             102
                                 Daughter’s Household
 Daughter                        101111103               41                       11             103
 Son-in-Law                      101111103               41                       11             301
 Newborn                         101111103               41                       41             1001
a
  Prior to the 1996 Panel, only the 1992 Panel had 10 or more waves. Wave 2 of the 1992 Panel of the core wave
files has expanded address and person ID fields (3 and 4 digits, respectively) to accommodate Wave 10 of the 1992
panel.


                                                    11-26
                                                            USING TOPICAL MODULE FILES

Prior to the 1996 Panel, there were two extremely rare occasions when the original ID, ENTRY,
and PNUM values were modified by the Census Bureau:

1. The first occasion was when two separate sampling units, each containing original sample
   members, were merged, perhaps because of a marriage. In this situation, one of the original
   sets of ID and ENTRY values was retained and the other set was changed to agree with that
   retained set. The person-number values (PNUM) of the changed set were modified further to
   be between 180 and 199, inclusive.
2. The second occasion was when a household split into two new households (in which each
   new household gained a new sample person) and later the households recombined. For
   example, suppose that a married couple separated in Wave 3, each moving in with a sibling.
   Both siblings were assigned a person number of 301 because they entered the sample in
   Wave 3 at different addresses (thus, ADDID = 31 and 32). If the husband and wife reunited
   in Wave 6, and brought the siblings with them, one of the sibling’s person numbers would
   have been changed. In this case, one of the siblings would have a person number of 301 and
   the other would have a person number of 680 (or some number between 680 and 699,
   inclusive).
Those two occasions were the only times when ID, ENTRY, and PNUM changed. When it did
occur, the old ID variables were stored in the previous wave variables (PWSUID, PWENTRY,
and PWPNUM), found only on the core wave files.26

When the merge occurred after the first month of a reference period, the members of the merged
household (whose ID variables were modified) were assigned two sets of monthly records in the
core wave file. The first set of records contained the original ID information and identified the
person as having exited the sample at the time of the merge. The second set contained the new
ID information and identified the person as having entered the sample at the time of the merge.
When the merge occurred at the start of the reference period, only the second set of records was
retained in the core wave files.

Because merged households were very rare prior to the 1996 Panel, information about them will
no longer be carried on the topical module files from the 1996 Panel. When either of those two
kinds of events occur in the 1996 Panel, one or more original sample members will appear to
leave the sample when the merge takes place, and new people will appear to enter the sample
when the merged household forms. There is no indication in the data files that the “new” sample
members were previously members of the SIPP sample with different ID values.


Topcoding
To protect the confidentiality of SIPP respondents, the Census Bureau topcodes characteristics
available on the topical module files that might allow a user to recognize the identity of a SIPP

26
  In the 1993 Panel, merged households are identified with the variables PWSUID, PWENTRY, and PWPNUM.
Before the 1993 Panel, they were identified with the variables PREV-ID, SC0064, and SC0066.


                                               11-27
SIPP USERS’ GUIDE

respondent. The topcoding procedures used in the topical module files are similar to those used
in the core wave files.27 Generally, topcodes for continuous variables that apply to the total
universe include at least ½ of 1 percent of all cases. For income variables that apply to
subpopulations, topcodes include either 3 percent of the appropriate cases or ½ of 1 percent of all
cases, whichever is the higher topcode. Any discrete information that is topcoded in the core
wave files is topcoded in a consistent manner in the topical module files.

Characteristics that are frequently topcoded in SIPP topical module files include income and
expense values, including those for a broad range of assets and liabilities. For example, the
following groups of topical module variables appear in Wave 3 of the 1996 Panel: assets and
liabilities, interest earnings, medical expenses, mortgage amounts, other financial assets, real
estate, rental properties, stocks and mutual funds, value of business, and work-related expenses
and child support paid. The documentation for the variables included in these groups indicates
whether the values are topcoded and the value ranges for the variables.


Using Allocation (Imputation) Flags
As described in Chapter 4, the Census Bureau often imputes information when a person does not
respond to the survey or to a particular question. A variable of interest may be imputed. In the
topical module files prior to the 1996 Panel, there is an allocation (imputation) flag for almost all
of the person-level variables. Beginning with the 1996 Panel, there is an allocation (imputation)
flag associated with every variable subject to imputation. For example, AEDUCATE is the
allocation (imputation) variable that identifies whether EEDUCATE is imputed.

Variables are imputed and the allocation (imputation) flags are set before composite variables are
created. For example, if income is imputed for one member of a household, that person’s
allocation (imputation) flag is set. However, total household income is computed after that
imputation; if any household member had any income imputed, total household income is based,
in part, on imputed information. There is no direct indication on the records of other household
members that any information has been imputed.


Using Weights
The topical module files contain one weight variable—WPFINWGT (FINALWGT). For the
1996 Panel, this weight is the person cross-sectional weight for the fourth reference month. Prior
to 1996, this weight was the person interview month weight for people who provided data for a
topical module. It shows the number of people in the population represented by the sample
person in the interview month.

27
   Chapter 10 contains a discussion of both the new income topcoding procedures used in the 1996 Panel core wave
files and the income topcoding procedures used in the pre-1996 core wave files. See also Appendix B: SIPP
Topcoding Specifications.


                                                    11-28
                                                           USING TOPICAL MODULE FILES

The source and accuracy statements that accompany all SIPP topical module files ordered from
the Census Bureau provide suggestions on how to use the topical module weight variable. Also,
Chapter 8 of this Guide contains a full discussion of how to use weights in SIPP data files.


Identifying States
For the 1996 Panel, the variable TFIPSST identifies 45 states and the District of Columbia. The
remaining five states are combined as follows:

1. Maine, Vermont; and
2. North Dakota, South Dakota, Wyoming.
The topical module files from panels prior to the 1996 Panel contain a variable STATE that
identifies the state in which the household resides. The variable identifies 41 individual states
and the District of Columbia; the nine other states are combined into three groups:

1. Maine, Vermont;
2. Iowa, North Dakota, South Dakota; and
3. Alaska, Idaho, Montana, Wyoming.
Even though it is possible to identify most states, SIPP was not designed to be representative at
the state level and should not be used to produce state-level estimates. The state variable is
included on the public use files to allow examination of how state-level characteristics affect
national estimates. For example, a user could apply the state-specific eligibility criteria for a
means-tested program in order to arrive at a national estimate of the number of eligible
participants. Because some states are not uniquely identified, some method of allocating the
state-specific eligibility rules to sample people in those states would need to be devised.


Identifying Metropolitan Areas
The topical module files do not contain any variables identifying metropolitan areas. Those
needing that information should merge it from the core wave files or the full panel files. Analysts
should see Chapters 10 and 12 for discussions of the core wave files and the full panel files,
respectively. Chapter 13 discusses how to merge multiple SIPP public use files.


                                              11-29
12. Using the 1990–1993 Full Panel
    Longitudinal Research Files
This chapter discusses procedures for working with data from the full panel longitudinal research
files for the 1990 through 1993 Panels of the Survey of Income and Program Participation
(SIPP). Because the full panel longitudinal research file for the 1996 Panel was still under
development at the time this chapter was written, it is not yet possible to describe procedures for
using that file. A revised version of this chapter will be available once the longitudinal research
file for the 1996 Panel is released to the public.

The chapter begins by describing the documentation that accompanies the full panel public use
files obtained from the Census Bureau. The discussion then turns to the data files themselves.
The data file structure is described, and detailed explanations are provided about how to use the
longitudinal research files when performing common tasks, including:

!   Realigning the data by calendar month;
!   Using the monthly interview status variables;
!   Identifying persons, households, families, and program units;
!   Working with the unearned income data;
!   Understanding the effects of topcoding;
!   Using imputation flags; and
!   Identifying states and metropolitan areas.
Before reading this chapter, users should read Chapter 9 for an introduction to Section II.
Analysts using only one longitudinal research file should also read about the use of sample
weights (Chapter 8) and the computation of standard errors (Chapter 7). Those planning on
merging data from a longitudinal research file to data from the core wave or topical module files
should read Chapter 10 for information about the core wave files, Chapter 11 for information
about the topical module files, and Chapter 13 for information about linking SIPP public use
files.

This chapter focuses on the longitudinal research files. It is written so that it can be used
independently of the chapters describing the core wave files and topical module files. Although
there are many similarities across the three types of files, important differences do exist. Because
those differences are sometimes subtle, users familiar with the core wave and topical module
files should read this chapter carefully, paying close attention to information about variable


                                                 12-1
SIPP USERS’ GUIDE

names and file structures. Table 9-2 summarizes the differences between the core wave, topical
module, and longitudinal research files.1


Using the Technical Documentation of the
1990–1993 Longitudinal Research Files
Each data file received from the Census Bureau comes with a set of technical documentation and
a data dictionary. The technical documentation includes:

!   The paper survey instrument;
!   A glossary of selected terms;
!   A cross-walk, mapping reference months into calendar months for each rotation group;
!   A source and accuracy statement describing the sample weights and the computation of
    standard errors; and
!   User Notes.
The survey instrument is vital to understanding what questions were asked, how they were asked,
the order in which they were asked, to whom they were asked, and the way in which the answers
were recorded. Some questions employ skip patterns (Chapter 3), so users should pay particular
attention to which questions were skipped for which respondents. These skip patterns are best
understood by consulting the survey instruments.2

The source and accuracy statements provide information about the weights on the files, when
and how to make adjustments to the weights, and one approach to computing standard errors for
some common types of estimates. More detailed discussions of those topics are provided in
Chapters 7 and 8 of this Guide.

The data dictionary provides a detailed description of each variable on the file. It describes four
aspects of each variable:

1. The definition;
2. The sample universe of the corresponding survey question;


1
  Some of this information will change once the 1996 longitudinal research file becomes available. At that time, this
guide will be updated to reflect the differences.
2
  With the introduction of CAI (computer-assisted interviewing) in the 1996 Panel, questionnaire documentation is
now available at the SIPP Web site at http://www.sipp.census.gov/sipp/.


                                                       12-2
               USING THE 1990–1993 FULL PANEL LONGITUDINAL RESEARCH FILES

3. The ranges for all legal values; and
4. The location (and size) in the file.
A machine-readable version of the data dictionary accompanies each data file. It can also be
downloaded from the Internet (http://www.sipp.census.gov/sipp/).

The data dictionary is formatted to facilitate processing by user-written computer programs.3 As
shown in Figure 12-1, a “D” in the first column signifies that the next few lines define the
variable: (1) the variable name, (2) the total number of columns occupied by the variable, (3) the
starting position, (4) the number of occurrences of that variable, and (5) the size of each
occurrence of the variable.4 A “U” in the first column indicates that the next words describe the
universe.5 A “V” in the first column indicates that the next number and phrase describe one of
the values of the variable. An asterisk in the first column denotes a comment. A period (.) before
a word denotes the start of the value label.6

The format of the data dictionary for the longitudinal research files is different from that used for
the core wave and topical module files. The full panel data dictionary includes two extra fields
on the line with a “D” in the first column. The first extra field contains the number of
occurrences of the variable, and the second extra field contains the number of digits for each
occurrence of the variable. These fields are needed because some variables in the longitudinal
research file occur x times, depending on the number of waves, or y times, depending on the
number of months in the panel.

HH-ADDID in Figure 12-1 is a monthly variable containing two digits (monthly because it
occurs 36 times). PP-MIS is also a monthly variable, but its length is one digit. PP-INTVW
appears once per wave (because it occurs nine times), and PP-ENTRY, PP-PNUM, SU-TOTPP,
and PP-RCSEQ occur once for the entire panel.

Figure 12-2 shows sample SAS and FORTRAN syntax for reading the data described by the
codebook fragment in Figure 12-1. Additional SAS program code could be used to associate
variable labels and value labels (SAS “formats”) with the PP-MIS and PP-INTVW variables.

3
  The data dictionaries for the longitudinal research files use a different format from that used for the core wave and
topical module files. Users who have worked with the core wave and topical module files should take care to note
those differences. In addition, the formats of the data dictionaries for the 1996 Panel core wave and topical module
files, as well as the variable names used in those files, have changed in the 1996 Panel. This chapter uses variable
names from the 1990–1993 SIPP Panels. When longitudinal research files are released from the 1996 Panel, a
revised version of this chapter will be available with updated information. Users will be able to download that
version from the SIPP Web site at http://www.sipp.census.gov/sipp/.
4
  The data dictionary for the 1992 longitudinal research file used a different format from that used in the other pre-
1996 longitudinal research files. In the 1992 data dictionary, the first line for each new variable, labeled with a “D”
in column 1, has the following fields: variable name, total size (number of characters), start location, the length of a
single occurrence of the variable, the number of occurrences of the variable, and the number of implied decimals.
5
  The universe definitions included in the data dictionaries prior to the 1996 Panel were often inaccurate. Users of
pre-1996 SIPP Panels should check the skip patterns in the actual survey questionnaire to determine which subset of
respondents was asked each question.
6
  The data dictionary for the 1992 longitudinal research file also has a line labeled with an “R” in column 1. This
line provides the range of values for the variable.


                                                        12-3
SIPP USERS’ GUIDE

     Figure 12-1. Excerpt from the 1993 Longitudinal Research File Data Dictionary


D PP-ENTRY     2     17     1     2
    Range = (11:99)
    Edited entry address ID
    Address ID of the household that this person belonged to at the time this
       person first became part of the sample

D PP-PNUM     3     19       1      3
    Range = (101:999)
    Edited person number

D SU-TOTPP     2     22     1     2
    Range = (1:60)
    Total number of person records for this sample unit

D PP-RCSEQ     2     24     1     2
    Range = (1:60)
    Sequence number of person record within sample unit

D HH-ADDID     72     26     36     2
    Range = (0:99)
    Address ID. —— This field identifies the household this person lived in
       this month

D PP-INTVW     9     98     9    1
    Range = (0:4)
    Person’s interview status for the relevant interview
V          0 .Not applicable (children under .15), not in sample, nonmatch
V          1 .Interview (self)
V          2 .Interview (proxy)
V          3 .Noninterview – Type Z refusal
V          4 .Noninterview - Type Z other

D PP-MIS     36     107     36     1
    Range = (0:2)
    Person’s interview status for this month
V          0 .Not matched or not in sample
V          1 .Interview
V          2 .Non-interview


                                         12-4
              USING THE 1990–1993 FULL PANEL LONGITUDINAL RESEARCH FILES

            Figure 12-2. Corresponding SAS and FORTRAN Syntax to Read in Data
                   from the 1993 Longitudinal Research File Data Dictionary

                                                SAS

     Input
        @17         PP_ENTRY   2.
                    PP_PNUM    3.
                    SU_TOTPP   2.
                    PP_RCSEQ   2.
                    (ADDID1-ADDID36)   (2.)
                    (INTVW1-INTVW9)    (1.)
                    (PP_MIS1-PP_MIS36) (1.)
                    ;
                                             FORTRAN

                  INTEGER*2   PP_ENTRY
                  INTEGER*2   PP_PNUM
                  INTEGER*1   SU_TOTPP
                  INTEGER*1   PP_RCSEQ
                  INTEGER*1   HH_ADDID(36)
                  INTEGER*1   PP_INTVW(9)
                  INTEGER*1   PP_MIS(36)

                  READ(infile,1000) PP_ENTRY, PP_NUM, SU_TOTPP,
              $            PP_RCSEQ, HH_ADDID, PP_INTVW, PP_MIS
     1000         FORMAT(T17, I2, I3, I2, I2, 36I2, 9I1, 36I1)


Relationship of the Longitudinal Research Data
Files to the SIPP Survey Instrument
The data dictionaries for the longitudinal research files do not replicate the survey instruments.
Analysts should keep a few things in mind when using the data:

!   The variables on the longitudinal research files do not correspond one-to-one with the
    questionnaire items. The variables are listed in a different order, some are not included in the
    longitudinal research file at all, and some are created from a combination of other variables.
!   The range of possible values of the variables does not always correspond one-to-one with the
    response categories shown on the survey instrument or in the data dictionary;
!   The variable name may not readily indicate its meaning; and


                                               12-5
SIPP USERS’ GUIDE

!      The complexity of the skip patterns may not be apparent just by looking at the data
       dictionary.7
To avoid potential problems and confusion, users should become familiar with the survey
instrument before using the data. When working with the data, analysts should refer to both the
survey instrument and the data dictionary.


Structure of the Longitudinal Research Files
The longitudinal research files contain one record for each person who was ever in the SIPP
sample for that panel. Even if the person was in the sample for just 1 month, there will be a
record for that person. There are records for children as well as for adults, and there are records
for people who entered the sample after the first wave.

Within each record, the variables correspond to the information that was collected in the core
interviews. While most of the core items are included in the longitudinal research files, some
items are not, and not all of the constructed variables found on the core wave files are included
on the longitudinal research files. In addition, no items from any of the topical modules are
included on the longitudinal research files. When items from the core wave or topical module
files are needed, those variables must be merged with data from the longitudinal research files.
Chapter 13 provides a detailed discussion of merging SIPP files.

The longitudinal research file structure differs from that of the core wave files. The longitudinal
research files contain just one record per person, while the core wave files contain one record per
person per month. Because some attributes do not change over the course of the panel, those
variables appear once on each record (e.g., rotation group, sample unit ID, person number, sex,
race, and ethnic origin). Some questions were asked once during each wave, so they appear x
times on each record, where x equals the number of waves for that panel (e.g., highest grade
attended, and participation in school breakfast and lunch programs). Most of the core questions
were asked for each month of the panel. They appear y times on each record, where y equals the
number of months for that panel (e.g., current address ID, monthly interview status, relationship
to the reference person, income, and program participation).

Table 12-1 shows that the 1992 Panel has 10 waves (or 40 months) of data. The 1993 Panel has
nine waves (or 36 months) of data. Thus, the interview status variable (PP-MIS) appears 40
times in the 1992 longitudinal research file, and it appears 36 times in the 1993 longitudinal
research file.


7
    See footnote 5.


                                              12-6
               USING THE 1990–1993 FULL PANEL LONGITUDINAL RESEARCH FILES

         Table 12-1. Summary of Panels, Waves, Reference Months, and Sample Sizes

                                                                                   Wave 1
Panel                                Number             Number of                  Eligible
Year           Reference Months      of Waves           Months                     Households
1984           Jun. 83 – Jun. 86        9               36                          20,897
1985           Oct. 84 – Jul. 87        8               32                          14,306
1986           Oct. 85 – Mar. 88        7               28                          12,425
1987           Oct. 86 – Apr. 89        7               28                          12,527
1988           Oct. 87 – Dec. 89        6               24                          12,725
1989           Oct. 88 – Dec. 89        3               There is no longitudinal research file for the 1989 SIPP.
1990           Oct. 89 – Aug. 92        8               32                          23,627
1991           Oct. 90 – Aug. 93        8               32                          15,626
1992           Oct. 91 – Mar. 95       10               40                          21,577
1993           Oct. 92 – Dec. 95        9               36                          21,823
Source: SIPP Quality Profile, 3rd Ed. (U.S. Census Bureau, 1998a).


Table 12-2 illustrates the longitudinal research file structure. In this example, there are five
people. Sample unit ID (PP-ID), person number (PP-PNUM), and entry address ID (PP-ENTRY)
appear once on each record because they are permanent characteristics of those people. Monthly
interview status (PP-MIS), a monthly variable, appears 40 times because the 1992 Panel had 10
waves and each wave collected information about the 4 months prior to the interview month.

People who were not interviewed (in person or by proxy) for 1 or more months over the course
of the panel either have their data imputed8 or are identified as not in the sample (PP-MIS equal
to either 0 or 2) for the months when they were not in the sample. The discussion of the PP-MIS
variable later in this chapter provides additional information.


How to Align Data by Calendar Month
It is frequently useful to realign the SIPP data by calendar month instead of reference month. For
example, researchers often want to analyze data for a specific calendar year (January through
December) or federal fiscal year (October through September).9 To do this, the analyst must


8
  Imputation would be by Type Z and missing-wave imputations. Chapter 4 discusses imputation methods.
9
  The longitudinal research files do not contain calendar month weights. Those weights would be needed for some
types of longitudinal analyses, such as analyses of the dynamics of program participation, where the unit of analysis
is a spell of program participation (Chapter 8 provides a discussion of this example). Data from the longitudinal
research files can also be used for cross-sectional estimation, and they are often preferable to the data from the core
wave files because the edit and imputation procedures used for the longitudinal research files are believed to result
in less imputation error than the procedures used for the core wave files. The format of the file is sometimes easier
to work with, even for cross-sectional applications. In those instances, the calendar month weights must be merged
from the core wave files. Chapter 8 provides a detailed discussion of weighting procedures in the SIPP. Chapter 13
provides a detailed discussion of linking SIPP files.


                                                        12-7
                                  Table 12-2. Example of the Longitudinal Research File Structure


                                                                                                                                          SIPP USERS’ GUIDE
                                                                                    PP-MIS
                                             Wave 1             Wave 2              Wave 3             Wave 4              Wave 5
                                             Month              Month                Month             Month               Month
                   PP-     PP-    PP-
       PP-ID       ENTRY   PNUM   ROT   1    2   3    4    5    6   7     8    9    10 11    12   13   14   15   16   17   18   19   20
       112612345   11       101   2     1    1   1    1    1    1   1     1    1    1   1    1    1    1    1    1    1    1    1    1
       112987122   11       101   2     1    1   1    1    1    1   1     1    1    1   1    0    0    0    0    0    0    0    0    0
       987913389   11       101   3     1    1   1    1    1    1   1     1    1    1   1    1    1    1    1    1    1    1    1    1
       123912879   11       101   3     1    1   1    1    1    1   1     1    1    1   1    1    1    1    1    1    1    1    1    2
       123912879   11       201   3     0    0   0    0    0    1   1     1    1    1   1    1    2    2    1    1    1    1    1    0
       874943283   11       101   4     1    1   1    1    1    1   1     1    1    1   1    1    1    1    1    1    1    1    1    1
       788723892   11       101   4     1    1   1    0    0    1   1     1    1    1   1    1    0    0    1    1    1    1    1    1
       788723892   11       102   4     1    1   1    1    1    1   1     1    1    1   1    1    2    2    2    2    0    0    0    0
       788723892   11       301   4     0    0   0    0    1    1   1     1    1    1   1    1    1    1    1    1    1    1    1    1
       788723892   11      1001   4     0    0   0    0    0    0   0     0    0    0   0    0    0    0    0    0    0    0    0    0
       763483873   11       101   1     1    1   1    1    1    1   1     1    1    1   1    1    1    1    1    1    1    1    1    1
12-8


       890987123   11       101   1     1    1   1    1    1    1   1     1    1    2   2    2    1    1    1    1    1    1    1    2
                                                                                    PP-MIS
                                             Wave 6              Wave 7             Wave 8             Wave 9              Wave 10
                   PP-     PP-    PP-         Month              Month               Month              Month              Month
       PP-ID       ENTRY   PNUM   ROT   21   22 23    24   25   26 27     28   29   30 31    32   33   34 35     36   37   38 39     40
       112612345   11       101   2     1    1   1    1    1    1   1     1    1    1   1    1    1    1   1     1    1    1   1     1
       112987122   11       101   2     0    0   0    0    0    0   0     0    0    0   0    0    0    0   0     0    0    0   0     0
       987913389   11       101   3     1    1   1    1    1    1   1     1    1    1   1    1    1    1   1     1    1    1   1     1
       123912879   11       101   3     2    1   1    1    0    0   2     2    2    0   0    0    0    0   0     0    0    0   0     0
       123912879   11       201   3     0    0   0    0    0    0   0     0    0    0   0    0    0    0   0     0    0    0   0     0
       874943283   11       101   4     1    1   1    1    1    1   1     1    1    1   1    1    1    1   1     1    1    1   1     1
       788723892   11       101   4     1    1   1    1    2    2   2     2    0    0   0    0    0    0   0     0    0    0   0     0
       788723892   11       102   4     0    0   0    0    0    0   0     0    0    0   0    0    0    0   0     0    0    0   0     0
       788723892   11       301   4     1    1   1    1    1    0   0     0    0    0   0    0    0    0   0     0    0    0   0     0
       788723892   11      1001   4     0    0   0    0    0    0   0     0    0    0   0    0    0    0   0     0    0    1   1     1
       763483873   11       101   1     1    1   1    1    1    1   1     1    1    1   1    1    1    1   1     1    0    0   0     0
       890987123   11       101   1     2    2   1    1    1    1   1     1    1    1   2    2    2    1   1     1    0    0   0     0
               USING THE 1990–1993 FULL PANEL LONGITUDINAL RESEARCH FILES

know the reference period for each rotation group of the panel. That information is included with
the technical documentation that accompanies the longitudinal research files.

Table 12-3 shows the reference period for each rotation group of the 1992 Panel. It shows that
the reference period for rotation group 2 is October 1991–January 1995. The reference period for
rotation group 3 is November 1991–February 1995. The reference period for rotation group 4 is
December 1991–March 1995. The reference period for rotation group 1 is January 1992–
December 1994 (interviews were not conducted in Wave 10 for this rotation group).

                       Table 12-3. Reference Periods for Each Rotation Group
                                          of the 1992 Panel

                         Rotation
                         Group
                         (ROT)               Reference Period
                         2                   October 1991–January 1995
                         3                   November 1991–February 1995
                         4                   December 1991–March 1995
                         1                   January 1992–December 1994


The following algorithm (Figure 12-3), written for the 1992 Panel, illustrates one approach to
realigning the SIPP reference months to common calendar months. The mapping depends on the
panel and rotation group and must be applied to each person. The first step establishes the
displacement or realignment of the months. The second step initializes each monthly variable to
–9 to distinguish the calendar months in which the variable is not relevant.10 The loop goes from
1 to 42 because in the 1992 Panel the first reference month was October 1991 and the last
reference month was March 1995, which means that there were 42 calendar months covered by
the panel. The third part of the algorithm realigns the input data to be based on the calendar
month. Table 12-4 displays the data after the realignment.


Using the Monthly Interview Status
(PP-MIS) Variables
The monthly interview status variable helps to determine whether the data for a person in a given
month should be used. In the longitudinal research files, this variable is labeled PP-MIS, and it
has one occurrence for each reference month of the SIPP panel. Some people refer to it as the in-
sample variable to distinguish it from the interview status variable (PP-INTVW). The PP-MIS
variables have three possible values: 0, 1, and 2.


10
   If –9 is a possible value for the variables being realigned (e.g., self-employed income can be negative), a different
starting value must be used.


                                                        12-9
SIPP USERS’ GUIDE

         Figure 12-3. Algorithm for Realigning SIPP Panel Month to Calendar Months
                                      in the 1992 Panel

     /*
     Create a variable that identifies the number of months each
     rotation group differs from the baseline
     */
     If ROT = 2
        DISPLACEMENT = 0
     Else if ROT = 3
        DISPLACEMENT = 1
     Else if ROT = 4
        DISPLACEMENT = 2
     Else if ROT = 1
         DISPLACEMENT = 3
     End if

     /*
     Initialize the new, re-aligned variable. This is not needed in SAS.
     When this step is used, an initial value should be chosen that
     is not a legal value for the variable in the actual data.
     */
     For each calendar month (for CALMM = 1 to 42):
        NEW-PP-MIS(CALMM) = -9
     End loop

     /*
     Create the newly re-aligned variable
     */
     For each reference month (for MONTH = 1 to 40):
        CALMM = MONTH + DISPLACEMENT
        NEW-PP-MIS(CALMM) = PP-MIS(MONTH)
     End loop


The monthly interview status is the only reliable guide to whether the data for a given person
should be used in a given month. Analysts should use only data for those months in which a
person’s interview status (PP-MIS) is equal to 1.11

Any data present for months in which a person’s interview status is coded either 0 or 2 should be
ignored. A code of 0 indicates that the person was not in the sample that month, and a code of 2
indicates a noninterview for that month.12

11
   As a safeguard against inadvertently using data for months when PP-MIS is not equal to 1, all monthly variables
in the user’s data extract should be set to a missing value for months when PP-MIS is not equal to 1. Most statistical
packages allow certain values to be flagged as “missing.” Once flagged, those values are excluded from
computations.
12
   Beginning with the 1991 Panel, new “missing wave” imputation procedures were instituted for the longitudinal
research files. Whenever data for a wave are imputed (the WAVFLG variable), PP-MIS is recoded to 1 on the
longitudinal research files, indicating that the data for those months should be used. In some cases, these people will
have records in the core wave files that were created during the Type Z imputation processing (see Chapter 4 for
details). In some of these instances, however, the longitudinal research file will have data for people who are not
present on the associated core wave data files.


                                                       12-10
                            Table 12-4. Monthly Data from the 1992 Panel, Realigned by Calendar Month

                                                                                       NEW-PP-MIS
                                                1991                                                  1992
                    PP-     PP-    PP-


                                                                                                                                                USING THE 1990–1993 FULL PANEL LONGITUDINAL RESEARCH FILES
                                                                                                                                                USING THE 1990–1993 FULL PANEL LONGITUDINAL RESEARCH FILES
                                                                                                                                                USING THE 1990–1993 FULL PANEL LONGITUDINAL RESEARCH FILES
                                                                                                                                                USING THE 1990–1993 FULL PANEL LONGITUDINAL RESEARCH FILES
        PP-ID       ENTRY   PNUM   ROT    Oct   Nov    Dec    Jan   Feb    Mar    Apr     May     Jun    Jul   Aug   Sep    Oct   Nov     Dec
        112612345   11       101   2      1     1      1      1     1      1      1       1       1      1     1     1      1     1       1
        112987122   11       101   2      1     1      1      1     1      1      1       1       1      1     1     0      0     0       0
        987913389   11       101   3      -9    1      1      1     1      1      1       1       1      1     1     1      1     1       1
        123912879   11       101   3      -9    1      1      1     1      1      1       1       1      1     1     1      1     1       1
        123912879   11       201   3      -9    0      0      0     0      0      1       1       1      1     1     1      1     2       2
        874943283   11       101   4      -9    -9     1      1     1      1      1       1       1      1     1     1      1     1       1
        788723892   11       101   4      -9    -9     1      1     1      0      0       1       1      1     1     1      1     1       0
        788723892   11       102   4      -9    -9     1      1     1      1      1       1       1      1     1     1      1     1       2
        788723892   11       301   4      -9    -9     0      0     0      0      1       1       1      1     1     1      1     1       1
        788723892   11      1001   4      -9    -9     0      0     0      0      0       0       0      0     0     0      0     0       0
        763483873   11       101   1      -9    -9     -9     1     1      1      1       1       1      1     1     1      1     1       1
12-11


        890987123   11       101   1      -9    -9     -9     1     1      1      1       1       1      1     1     1      2     2       2

                                                                                       NEW-PP-MIS
                                                                                          1993
                        PP-      PP-     PP-
            PP-ID       ENTRY    PNUM    ROT     Jan    Feb     Mar       Apr    May    Jun     Jul     Aug    Sep   Oct      Nov    Dec
            112612345   11       101     2       1      1       1         1      1      1       1       1      1     1       1       1
            112987122   11       101     2       0      0       0         0      0      0       0       0      0     0       0       0
            987913389   11       101     3       1      1       1         1      1      1       1       1      1     1       1       1
            123912879   11       101     3       1      1       1         1      1      2       2       1      1     1       0       0
            123912879   11       201     3       1      1       1         1      1      0       0       0      0     0       0       0
            874943283   11       101     4       1      1       1         1      1      1       1       1      1     1       1       1
            788723892   11       101     4       0      1       1         1      1      1       1       1      1     1       1       2
            788723892   11       102     4       2      2       2         0      0      0       0       0      0     0       0       0
            788723892   11       301     4       1      1       1         1      1      1       1       1      1     1       1       1
            788723892   11      1001     4       0      0       0         0      0      0       0       0      0     0       0       0
            763483873   11       101     1       1      1       1         1      1      1       1       1      1     1       1       1
            890987123   11       101     1       1      1       1         1      1      1       1       2      2     2       1       1
                                                                                                                           (table continues)
                                                                                                                                      SIPP USERS’ GUIDE
                                                                                                                                      SIPP USERS’ GUIDE
                                                                                                                                      SIPP USERS’ GUIDE
                      Table 12-4. Monthly Data from the 1992 Panel, Realigned by Calendar Month (continued)

                                                                                 NEW-PP-MIS
                                                                          1994                                           1995
                    PP-     PP-     PP-
        PP-ID       ENTRY   PNUM    ROT    Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec   Jan   Feb    Mar
        112612345   11       101    2      1     1     1     1     1     1     1     1     1     1     1     1     1     –9     –9
        112987122   11       101    2      0     0     0     0     0     0     0     0     0     0     0     0     0     –9     –9
        987913389   11       101    3      1     1     1     1     1     1     1     1     1     1     1     1     1      1     –9
12-12


        123912879   11       101    3      2     2     2     0     0     0     0     0     0     0     0     0     0      0     –9
        123912879   11       201    3      0     0     0     0     0     0     0     0     0     0     0     0     0      0     –9
        874943283   11       101    4      1     1     1     1     1     1     1     1     1     1     1     1     1      1      1
        788723892   11       101    4      2     2     2     0     0     0     0     0     0     0     0     0     0      0      0
        788723892   11       102    4      0     0     0     0     0     0     0     0     0     0     0     0     0      0      0
        788723892   11       301    4      0     0     0     0     0     0     0     0     0     0     0     0     0      0      0
        788723892   11      1001    4      0     0     0     0     0     0     0     0     0     0     0     0     1      1      1
        763483873   11       101    1      1     1     1     1     1     1     1     1     1     1     1     1     0      0      0
        890987123   11       101    1      1     1     1     1     1     1     2     2     2     1     1     1     0      0      0
              USING THE 1990–1993 FULL PANEL LONGITUDINAL RESEARCH FILES

The presence of data in analysis fields for any given month is not a reliable guide to whether the
person should be included in the planned analyses. Data are collected for all months of the
reference period for a given wave, even if the interviewed person was in the sample for only part
of the reference period. Data are also present even if the person was not interviewed. Information
from the questionnaire is imputed when the person was in sample for at least 1 month of the
reference period but not actually interviewed. That includes people who moved out of scope (as
defined in Chapter 2), people who died, and people who refused to be interviewed. The entire
questionnaire was imputed for Type Z noninterviews (people who refused to be interviewed,
living in households where other members were successfully interviewed). Chapter 4 examines
imputation procedures; Chapter 8 provides information on weighting. Data are collected for all
months of the reference period even if the interviewed person was in the sample for only part of
the reference period.

The presence of a positive weight is also not a reliable guide to whether a person should be
included in the planned analysis. Although people with zero weights will not enter into any
weighted tabulations, they may provide important contextual information about people who do
enter into those (weighted) tabulations. For example, a zero-weight person who is a member of
the same household as a positive-weight person for only 3 months provides information about
the positive-weighted person’s household (including, for example, household size, composition,
income, and program participation) for that 3-month period. That is why records for these zero-
weighted people are retained in the SIPP full panel data files.13


Identifying Persons
There are many occasions when a user may need to identify which records belong to each
individual in the SIPP data files. That need arises, for example, during the following procedures:

!    Merging data from topical module or full panel files to core wave files;
!    Combining data from two or more core wave files;
!    Linking husbands and wives;
!    Linking parents and children; and
!    Identifying which person received government transfer income on behalf of the family.
To uniquely identify a person in the longitudinal research files, analysts should use the three
variables shown in Table 12-5.14

13
   Using the PP-MIS variable shown in Table 12-2, one can see that the first person within each rotation group was
in sample every month of the panel. The second person shown in the table left the sample before the third interview
(information was probably collected by proxy interview for that wave) and did not return to the sample. The eighth
person left the sample in month 13. The tenth person entered the sample in month 38 (the last wave).
14
   Beginning with the 1996 Panel, the entry address ID will no longer be needed: person numbers will be unique
within sample units. Continued use of the entry address ID will not create any problems. It is simply redundant
information.


                                                     12-13
SIPP USERS’ GUIDE

                  Table 12-5. Variables Used to Uniquely Identify a Person in the
                                   Longitudinal Research Files

              Variable Name                                   Description
              PP-ID                                           Sample unit ID
              PP-ENTRY                                        Entry address ID
              PP-PNUM                                         Person number


!    PP-ID uniquely identifies each initially sampled dwelling unit.15 Every person in the
     longitudinal research file was either a member of one of those units (an original sample
     member) or lived with someone during the life of the panel who was a member of an initially
     sampled dwelling unit. A person’s connection to that unit is an attribute of that person and
     does not change over time.16 This means that as people move from address to address, their
     PP-ID stays the same. As new people join the homes of original sample members, they
     receive the PP-ID of the original sample members.
!    PP-ENTRY identifies the address where the person lived at the time he or she was first
     interviewed. It does not change even if the person moves.17 It is used in conjunction with the
     person number and the sample unit ID to uniquely identify persons within the sampling unit.
     Values for this variable are unique only within sample units. The entry address ID has two
     components. The first part of the ID number (two digits in the 1992 Panel, and one digit in all
     others) identifies the wave in which SIPP interviews were first conducted at the address. The
     second part of the number (one digit in all panels) sequentially numbers addresses within a
     sample unit (PP-ID) that enter the sample in the same wave.
!    PP-PNUM uniquely identifies a person within the sample unit ID and entry address ID. PP-
     PNUM does not change even if the person moves.18 The first part of PP-PNUM (two digits in
     the 1992 Panel, and one digit in all others) indicates the wave in which the person was first
     interviewed.19 The remaining two digits are sequentially assigned within the household.
     Thus, original sample members are assigned person numbers ranging from 100 to 199.
     Individuals who enter the SIPP sample in Wave 2 are assigned a person number ranging from
     200 to 299. Those who enter in Wave 10 are assigned person numbers ranging from 1001 to
     1099.
Table 12-6 illustrates how the combination of PP-ID, PP-ENTRY, and PP-PNUM uniquely
identifies people and provides information about when they first entered the SIPP sample. In this
example, there are eight individuals: five are original sample members; one person joined the
15
   The PP-ID is a random recode of three other variables in the Census Bureau’s internal (not public use) files: the
respondent’s sampling area (PSU), the cluster of housing units within that area (called the “segment”), and a
sequentially assigned serial number. Those three variables are omitted from the public use files to protect the
confidentiality of the respondents.
16
   There is one rare exception to this rule, which is described in the section entitled “Identifying Movers” later in this
chapter.
17
   See footnote 16.
18
   See footnote 16.
19
   For Wave 10 of the 1992 Panel and for the 1996 Panel, the first two digits of PNUM instead of the first digit
identify the wave in which the person entered the sample.


                                                        12-14
              USING THE 1990–1993 FULL PANEL LONGITUDINAL RESEARCH FILES

SIPP sample in Wave 4, one person joined in Wave 7, and one person joined in Wave 10 (of the
1992 Panel).

      Table 12-6. How to Uniquely Identify a Person in the Longitudinal Research Files

 Sample              Entry              Person
 Unit ID             Address ID         Number
 (PP-ID)             (PP-ENTRY)         (PP-PNUM)          Notes
 123456789             11                101               Original sample member
 123456789             11                102               Original sample member
 123456789             11                401               Enters SIPP sample in Wave 4
 123456789             71                701               Enters SIPP sample in Wave 7
 321456789             11                101               Original sample member
 321456789             11                102               Original sample member
 321456789             11                103               Original sample member
 456789123            101               1001               Enters SIPP sample in Wave 10 of the 1992 Panel


Identifying Households
The term household, as used in Census Bureau publications, refers to a group of people who
occupy a housing unit. A house, an apartment or other group of rooms, or a single room is
regarded as a housing unit if it is occupied or intended for occupancy as separate living quarters.
That is, the occupants do not live and eat with any other people in the structure and there is direct
access from the outside or through a common hall. A group of friends sharing an apartment
constitutes a household. Rooming and boarding houses, college dormitories, convents, and
monasteries are classified as group quarters rather than households.

To uniquely identify a household or group quarters in the longitudinal research files in a given
month, analysts should use the variables shown in Table 12-7.20

               Table 12-7. Variables Used to Uniquely Identify a Household in the
                                  Longitudinal Research Files

           Variable Name                                Description
           PP-ID                                        Sample unit ID
           HH-ADDIDi                                    Current address ID in the ith month
           PP-MISi                                      Person’s interview status in the ith month


20
   Since household composition changes from one month to the next, it is generally not possible to construct
“longitudinal households.” Users should not infer commonality across months based solely on place of residence in
one month. The characteristics of the household to which a given person belongs (such as household size and
household income) should be evaluated separately for each month, based on just those people who reside together in
each specific month. Similar caution should be exercised when dealing with the characteristics of the family and,
when applicable, the subfamily to which a person belongs.


                                                     12-15
SIPP USERS’ GUIDE

People with the same PP-ID and HH-ADDIDi values and with a PP-MIS value of 1 live in the
same household (or group quarters) in the ith month of the reference period. The eight
individuals shown in Table 12-8 make up four households. The first household contains the first
four individuals. The second household contains one person. The third household contains one
person. The fourth household contains two people.

This example depicts the households in the ith month. These people could belong to different
households in other months. (Users may find it helpful when reading the following pages to refer
to Figure 2-1 [pp. 2-10–2-14], which illustrates changes in household composition.)

      Table 12-8. How to Uniquely Identify a Household or Group Quarters in a Given
                         Month of the Longitudinal Research Files

                Entry                       Person’s
  Sample        Address      Person         Interview        Current
  Unit ID       ID (PP-      Number         Status           Address ID
  (PP-ID)       ENTRY)       (PNUM)         (PP-MIS)         (HH-ADDIDi)       Notes
  123456789      11            101          1                 71               Four people in this household
  123456789      11            102          1                 71
  123456789      11            401          1                 71
  123456789      71            701          1                 71
  321456789      11            101          1                 31               One person in this household
  321456789      11            102          1                 32               One person in this household
  321456789      11            103          1               101                Two people in this household a
  321456789 101              1001           1               101
a
  Because this example includes a person with an entry address of 101, we know that the example refers to a month
from Wave 10 of the 1992 Panel (the only panel prior to 1996 with 10 or more waves).


Identifying Families
The term family, as used in Census Bureau publications, refers to a group of two or more people
related by birth, marriage, or adoption who reside together; all such individuals are considered
members of one family.21

!    A primary family is a family containing the household reference person and all of his or her
     relatives. This means that a household composed of a husband and wife, their son, and their
     son’s wife (i.e., the daughter-in-law) is classified as a primary family containing four people.


21
   As with households (see footnote 20), because family composition changes from one month to the next, it
generally is not possible to construct longitudinal families. Users should not infer commonality across months based
solely on family membership in one month. The characteristics of the family to which a person belongs (such as
family size and family income) should be evaluated separately for each month, and should be based on just those
people who reside together and are members of the same family in each specific month. Similar caution should be
exercised when dealing with the characteristics of the household and, when applicable, the subfamily (related or
unrelated) to which a person belongs.


                                                      12-16
              USING THE 1990–1993 FULL PANEL LONGITUDINAL RESEARCH FILES

!    A related subfamily is a nuclear family that is related to but does not include the household
     reference person. For example, the son and his wife (i.e., the daughter-in-law) in the
     preceding example are a related subfamily.
!    An unrelated subfamily (sometimes called a secondary family) is a nuclear family that is not
     related to the household reference person. Thus, a husband and wife who live in a friend’s
     house are classified as an unrelated subfamily. A mother and daughter who live in the
     mother’s boyfriend’s apartment are classified as an unrelated subfamily.
!    A primary individual is a household reference person who lives alone or lives with only
     nonrelatives. Primary individuals are sometimes treated by the Census Bureau as families
     with only one person and are referred to as pseudo-families.
!    A secondary individual is not a household reference person and is not related to any other
     people in the household. Secondary individuals are sometimes treated by the Census Bureau
     as families with only one person and are referred to as pseudo-families.
Unlike the core wave files, the longitudinal research files do not contain family identification
variables (e.g., FID, FID2, and SID). Analysts needing family identification variables must either
merge them from the core wave files (Chapters 10 and 13) or create them.22 Because family
composition can change over time, these are monthly variables. The algorithm in Figure 12-4
shows one approach to creating functional equivalents of the variables contained on the core
wave files.23

The variables created by this algorithm are functionally equivalent to the variables with the same
names on the core wave files: they will group people into the same family and subfamily groups.
However, the actual values assigned by this algorithm to these variables generally will not equal
the values found in the variables from the core wave files.

With these monthly variables (FIDi, FID2i, and SIDi), users can identify common family
membership in each month.24 The Census Bureau has two principal methods for distinguishing
families that are based on the variables and numbering schemes shown in Table 12-9. Analysts
must remember to choose which type of family classification they want and then use the
appropriate method.

!    The first method defines a family as all persons who are related and living together. The
     family ID variable FIDi is used with this definition. FIDi groups the household reference
     person with all related household members by assigning them the same ID number.


22
   In most cases, it is also possible to merge these variables from the core wave files. However, beginning with the
1991 Panel, a missing wave imputation procedure was applied to the longitudinal research files: data were imputed
for people with missing data for a wave but with valid data for the two adjacent waves. Although these people have
data in the longitudinal research file for imputed waves, some have no data in the core wave files (some of these
people are subject to Type Z imputation procedures that create records in the core wave files). For these people,
merging the family ID variables from the core wave files is not an option.
23
   This algorithm uses the following (monthly) variables found on the longitudinal research files: FAMTYP and
FAMNUM. These variables are discussed in greater detail in the next section.
24
   See footnotes 20 and 21.


                                                      12-17
SIPP USERS’ GUIDE


     Figure 12-4. Constructing Family and Subfamily ID Variables in the Longitudinal
                                      Research Files

For each person (index = ip):
      For each month (index = mo):
            If PP-MIS(mo, ip)= 1 then do:        <i.e., interview status>
                  If FAMTYP(mo, ip) = 0          <i.e., primary family>
                     then FID(mo, ip) = 1
                         FID2(mo, ip) = 1
                         SID(mo, ip) = 0
                  Else if FAMTYP(mo, ip) = 1     <i.e., secondary individual>
                     then FID(mo, ip) = 10000 + ip
                         FID2(mo, ip) = 10000 + ip
                         SID(mo, ip) = 0
                  Else if FAMTYP(mo, ip) = 2     <i.e., unrelated subfamily>
                     then FID(mo, ip) = 100 + FAMNUM(mo, ip)
                         FID2(mo, ip) = 100 + FAMNUM(mo, ip)
                         SID(mo, ip) = 0
                  Else if FAMTYP(mo, ip) = 3     <i.e., related subfamily>
                     then FID(mo, ip) = 1
                         FID2(mo, ip) = 0
                         SID(mo, ip) = FAMNUM(mo, ip)
                  Else if FAMTYP(mo, ip) = 4       <i.e., primary individual>
                     then FID(mo, ip) = 10000 + ip
                         FID2(mo, ip) = 10000 + ip
                         SID(mo, ip) = 0
                  End if
            End “PP-MIS = 1” Block
      End month loop
End person loop


     Table 12-9. Variables Used to Identify Families in the Longitudinal Research Files

Variable Name                         Description
PP-ID                                 Sample unit ID
HH-ADDIDi                             Address ID in the ith month
PP-MISi                               Person’s interview status in the ith month
And one of the following created variables:
FIDi                                  Family ID in the ith month
FID2i                                 Family ID in the ith month, excluding related subfamily members (FID2i
                                      equals zero for related subfamily members)
SIDi                                  Family ID in the ith month for related subfamily members (SIDi assigns
                                      nonzero values only to members of related subfamilies)
FID2i and SIDi                        Family ID in the ith month, separating related subfamilies from the primary
                                      family
Note: Variables FIDi, FID2i, and SIDi are not included on the longitudinal research files. They can be created by
using the algorithm shown in Figure 12-4 or merged from the core wave files.


                                                    12-18
              USING THE 1990–1993 FULL PANEL LONGITUDINAL RESEARCH FILES

This family group corresponds to the Census Bureau’s definition of a primary family. FIDi
groups members of each unrelated subfamily (and primary and secondary individuals)
separately.

!    The second method is similar to the first in defining a family, but the family excludes related
     subfamilies. The family ID variable FID2i is used with this definition. FID2i equals zero for
     related subfamilies.
Analysts who want to analyze multigenerational families would use FID2i and the variable SIDi.
SIDi treats related subfamilies as distinct family units by assigning them nonzero values.
Analysts can easily distinguish unrelated subfamilies from other family units when they use
these variables and numbering schemes.

Table 12-10 illustrates the difference between FIDi, FID2i, and SIDi for a single month. In the
month shown, the first household contains a primary family of five people. The primary family
contains two related subfamilies. FIDi and FID2i mask the fact that there are two related
subfamilies; only SIDi provides that information. SIDi has nonzero values only for members of
related subfamilies. The second household contains a primary family and two unrelated
subfamilies. The third household contains a primary individual and an unrelated subfamily. The
fourth household contains only a primary individual. The fifth household is group quarters
containing two people. This example depicts those families in the ith month. These people could
belong to different families in other months.25

The specific analysis being planned will inform the choice of which family classification to use.
To group people into families in the same way that the Census Bureau does, analysts should use
PP-ID, PP-MISi, HH-ADDIDi, and FIDi. To analyze primary families excluding related
subfamily members, analysts should include only those records with FID2i greater than zero. To
analyze related subfamilies as distinct family units, analysts should use only those records with
SIDi greater than zero. To uniquely identify (1) primary families excluding related subfamilies
and (2) related subfamilies treated as distinct family groups, analysts should use PP-ID, PP-MISi,
HH-ADDIDi, FID2i, and SIDi. In those analyses, it is easy to distinguish unrelated families from
other families.


Variables Describing Household and Family
Composition
Table 12-11 shows the variables contained on the longitudinal research files summarizing
household and family composition.26


25
  See footnote 18.
26
  More detailed information about the relationships between members is collected in the Household Relationships
topical module. Those data provide extensive information about household composition at the time of the topical
module interview.


                                                   12-19
                                                                                                                                                         SIPP USERS’ GUIDE
               Table 12-10. How to Uniquely Identify a Family in a Given Month of the Longitudinal Research Files

                        Current      Person’s      Family ID,     Family ID,                                  Person
         Sample         Address      Interview     Including      Excluding                   Family          Number
         Unit ID        ID (HH-      Status        Subfamily      Subfamily     Subfamily     Type            (PP-
         (PP-ID)        ADDIDi)      (PP-MISi)     (FIDi)         (FID2i)       ID (SIDi)     (FAMTYPi)       PNUM)       Notes
         110011111      11           1                1              1          0             0               101         This household contains a
         110011111      11           1                1              0          2             3               102         primary family of five
         110011111      11           1                1              0          2             3               103         people. The primary
         110011111      11           1                1              0          3             3               104         family contains two
         110011111      11           1                1              0          3             3               105         related subfamilies.

         122210000      33           1                1              1          0             0               101         This household contains a
         122210000      33           1                1              1          0             0               104         primary family and two
         122210000      33           1              101            101          0             2               305         unrelated subfamilies.
12-20


         122210000      33           1              101            101          0             2               306
         122210000      33           1              102            102          0             2               307
         122210000      33           1              102            102          0             2               308

         555555555      21           1             1001          1001           0             4               101         This household contains a
         555555555      21           1              101           101           0             2               201         primary individual and an
         555555555      21           1              101           101           0             2               202         unrelated subfamily.
         555555555      21           1              101           101           0             2               203

         610000000      11           1             1001          1001           0             4               101         Primary individual.

         897454644      11           1             1001          1001           0             1               101         Group quarters with two
         897454644      11           1             1002          1002           0             1               102         secondary individuals.
        Notes: Variables FIDi, FID2i, and SIDi are not part of the longitudinal research files. They can be merged from the core wave files or created
        using the algorithm shown in Figure 12-4. FAMTYP = 0 means the person belongs to a primary family. FAMTYP = 1 means the person is a
        secondary individual. FAMTYP = 2 means the person belongs to an unrelated subfamily. FAMTYP = 3 means the person belongs to a related
        subfamily. FAMTYP = 4 means the person is a primary individual.
             USING THE 1990–1993 FULL PANEL LONGITUDINAL RESEARCH FILES

            Table 12-11. Variables Used to Describe Household Composition in the
                                 Longitudinal Research Files

 Variable Name        Description
 FAMTYPi              Type of family in the ith month (e.g., primary family, related subfamily)
 FAMRELi              Family relationship in the ith month (e.g., reference person, spouse of family reference
                      person, child of family reference person)
 RRPi                 Recoded relationship to the household reference person in the ith month (e.g., household
                      reference person living with relatives, child of household reference person)
 ENTID-SPi            Entry address ID of spouse in the ith month
 PNSPi                Person number of spouse in the ith month
 ENTID-PTi            Entry address ID of parent in the ith month
 PNPTi                Person number of parent in the ith month
 U-PNGj               Person number of guardian in the jth wave
 ENTID-GDj            Entry address ID of guardian in the jth wave


As Table 12-12 shows, RRPi summarizes the relationship of each person to the household
reference person in month i.

        Table 12-12. Relationship to the Household Reference Person in a Given Month

   Edited Relationship to
   the Household Reference
   Person (RRPi)                  Description
   1                              Household reference person, living with relatives
   2                              Household reference person, living alone or with nonrelatives
   3                              Spouse of household reference person
   4                              Child of household reference person
   5                              Other relative of household reference person
   6                              Nonrelative of household reference person, but related to other members of
                                  the household
   7                              Nonrelative of all members of the household


The household description depends on the identity of the reference person. For example, in Table
12-13, the household contains a mother, her daughter, and her daughter’s son. If the mother is the
household reference person (RRPi = 1), her daughter is listed as a child of the household
reference person (RRPi = 4) and the daughter’s son is listed as other relative of the household
reference person (RRPi = 5). If the daughter is the reference person, her son is listed as a child of
the household reference person (RRPi = 4) and her mother is listed as other relative of the
household reference person (RRPi = 5). Users should note that the household reference person
can change from one month to the next; thus, the household description could also change.


                                                  12-21
 SIPP USERS’ GUIDE

        Table 12-13. Using RRP to Identify Households Containing Three Generations in the
                                   Longitudinal Research Files

                                   Relationship to the Household
      Household Reference Person Reference Person (RRPi)                  Notes
      Mother as Household Reference Person
      Mother                       1                                      Reference person
      Daughter                     4                                      Child of reference person
      Daughter’s son               5                                      Other relative of reference person
      Daughter as Household Reference Person
      Daughter                     1                                      Reference person
      Daughter’s son               4                                      Child of reference person
      Mother                       5                                      Other relative of reference person


 Six other variables in the longitudinal research file can be used to describe household and family
 composition: PNSPi, ENTID-SPi, PNPTi, ENTID-PTi, U-PNGj, and ENTID-GDj. These six
 variables identify the person number and entry address ID of the spouse, parent, or guardian
 living at the same address as the person in the ith month or jth wave (in the last two cases).27 By
 building from these variables, the analyst can identify a variety of family configurations. For
 example, these variables can be used to identify households containing three generations.
 Table 12-14 displays one household containing a mother and her two children. One child (PP-
 PNUM = 102) has a son, and the other child (PP-PNUM = 104) has a spouse.

               Table 12-14. Using PNSP and PNPT to Identify Households Containing
                       Three Generations in the Longitudinal Research Files

                                       Relationship    Entry
               Entry       Person      to Household    Address ID              Entry
               Address ID Number Reference             of Spouse               Address ID
Household      (PP-        (PP-        Person          (ENTID-      Spouse     of Parent       Parent
Member         ENTRY)      PNUM)       (RRPi)          SPi)         (PNSPi)    (ENTID-PTi)     (PNPTi)   Notes
Mother         11          101         1               11           999        11              999       Mother
Daughter #1 11             102         4               11           999        11              101       Child
Daughter #1’s 11           103         5               11           999        11              102       Grandchild
son
Daughter #2 11             104         4               11           105        11              101       Child
Spouse of      11          105         5               11           104        11              999       Spouse of
Daughter #2                                                                                              child
  Note: Value of 999 means not applicable.


 27
   Parents and spouses always share the same sample unit ID (PP-ID) as the respondent. The variables are assigned
 values only in the months that people are living together. For example, a couple living together in Wave 1 would
 have values in the PNSP and ENTID-SP variables that pointed to each other. However, if they separate (and remain
 married) in Wave 2, the PNSP and ENTID-SP variables will be assigned values of 999 (indicating that the variables
 are not applicable).


                                                      12-22
                USING THE 1990–1993 FULL PANEL LONGITUDINAL RESEARCH FILES


  Using Family-Level Income Variables
  The longitudinal research files contain a number of family-level income variables. The family
  income variables on the longitudinal research files include the income of all related subfamily
  members. In other words, primary family members and related subfamily members are treated
  as one family by the Census Bureau when calculating family-level income amounts. The
  longitudinal research files do not contain any subfamily income variables. If family income
  variables are needed that do not pool related subfamilies with primary families, those income
  variables must be created. That is done by looping over persons with PP-MISi of 1 and with
  common PP-ID, HH-ADDIDi, FID2i, and SIDi for each month.28

  Table 12-15 illustrates how the family income variables on the longitudinal research files include
  the income of related subfamily members. From the previous example of a primary family of
  five people, the primary family contains two related subfamilies. Total family income (FF-INCi)
  is $3,100. The incomes of all subfamily members are included in that amount.

                   Table 12-15. Family Income in the Longitudinal Research Files

           Entry         Person       Person        Current         Family ID,    Sub-        Total         Person-
Sample     Address       Number       Interview     Address         Including     family      Family        Level
Unit ID    ID (PP-       (PP-         Status        ID (HH-         Subfamily     ID          Income        Income
(PP-ID)    ENTRY)        PNUM)        (PP-MISi)     ADDIDi)         (FIDi)        (SIDi)      (FF-INCi)     (PP-INCi)
 110011111 11             101          1             11              1             0           $3,100        $ 100
 110011111 11             102          1             11              1             2           $3,100        $ 500
 110011111 11             103          1             11              1             2           $3,100        $ 500
 110011111 11             104          1             11              1             3           $3,100        $ 1,000
 110011111 11             105          1             11              1             3           $3,100        $ 1,000


  More About Using the SIPP ID Variables:
  Identifying Movers
  When a person moves, the current address field (HH-ADDIDi) changes. The PP-ID, PP-ENTRY,
  and PP-PNUM values remain the same. The first digit (or first two digits in the 1992 Panel) of
  HH-ADDIDi indicate(s) the wave in which a household is first interviewed at that new address.
  The remaining digits sequentially number the households that split into two or more households,
  as a result of a move to a different location by original sample members. Thus, new addresses in
  Wave 2 are numbered 21, 22, and so on. New addresses in Wave 3 are numbered 31, 32, and so
  on. New addresses in Wave 10 are numbered 101, 102, and so on. (Readers may wish to refer to
  Figure 2-1 [pp. 2-10–2-14], which illustrates movement into and out of households.)


  28
    FIDi and SIDi are not included on the longitudinal research files. They can be merged from the core wave files or
  created by using the algorithm shown in Figure 12-4.


                                                       12-23
 SIPP USERS’ GUIDE

 Table 12-16 shows that persons 101 and 102 in the first household are original sample members.
 Person 401 moved into the home of persons 101 and 102 in Wave 4. In Wave 7, all three moved
 to a new location and were joined by person 701. In the second household, person 101 is an
 original sample member who moved to a new location in Wave 3. In the third household, person
 102 is an original sample member who used to live with persons 101 and 103 of the same sample
 unit ID (PP-ID), but moved to a new location in Wave 3 (to a different location from person
 101). In the fourth household, person number 103 is an original sample member who used to live
 with persons 101 and 102 of the same sample unit ID number. Person 103 moved to a new
 location in Wave 10 and was joined by person 1001, who just entered the SIPP sample. All but
 two people moved from their original location (i.e., only two people have HH-ADDIDi equal to
 PP-ENTRY).

             Table 12-16. How to Identify Movers in the Longitudinal Research Files

                      Entry      Person     Person       Current
         Sample       Address    Number     Interview    Address
         Unit ID      ID (PP-    (PP-       Status       ID (HH-
Wave     (PP-ID)      ENTRY)     PNUM)      (PP-MISi)    ADDIDi)   Notes
 1       123456789      11        101         1            11      Persons 101 and 102 are the original
         123456789      11        102         1            11      sample members
 4       123456789      11        101         1            11      Person 401 begins to live with them in
         123456789      11        102         1            11      Wave 4.
         123456789      11        401                      11
 7       123456789      11        101        1             71      All three people move in Wave 7 and
         123456789      11        102        1             71      person 701 joins them
         123456789      11        401        1             71
         123456789      71        701                      71
 1       321456789      11        101        1             11      Person 101, person 102, and person 103
         321456789      11        102        1             11      are original sample members.
         321456789      11        103        1             11
 3       321456789      11        101        1             31      Person 101 moved in Wave 3. Person 102
         321456789      11        102        1             32      moved in Wave 3 to a different location
         321456789      11        103        1             31      from person 101. Person 103 remained
                                                                   with person 101.
10       321456789     11         101        1            31       Person 103 is an original sample member
         321456789     11         102        1            32       who used to live with persons 101 and 102
         321456789     11         103        1           101       of the same ID. In Wave 10, person 103
         321456789    101        1001        1           101       lives in a new location with person 1001,
                                                                   who just entered the SIPP sample.


 The next example (Table 12-17) further illustrates how the ID system works as people move to
 new addresses, additional people move in with them, and households split. A review of Figure
 2-1 (pp. 2-10–2-14) may help in understanding the various household changes.

 !     In Wave 1, there is a five-person household consisting of a husband, a wife, a daughter, a
       son, and a cousin. Because this is the first wave, the current address number is 11, indicating


                                                 12-24
               USING THE 1990–1993 FULL PANEL LONGITUDINAL RESEARCH FILES

        Table 12-17. Another Example of Household Changes and Their Effects on the
                       ID Variables in the Longitudinal Research Files

                           Sample               Current                  Entry              Person
     Household             Unit ID              Address ID               Address ID         Number
     Member                (PP-ID)              (HH-ADDIDi)              (PP-ENTRY)         (PP-PNUM)
     Wave 1
     Father               101111103             11                      11                   101
     Mother               101111103             11                      11                   102
     Daughter             101111103             11                      11                   103
     Son                  101111103             11                      11                   104
     Cousin               101111103             11                      11                   105
     Wave 2
     Father               101111103             11                      11                   101
     Mother               101111103             11                      11                   102
     Daughter             101111103             11                      11                   103
     Son                  101111103             11                      11                   104
     Cousin               101111103             11                      11                   105
     Wave 3
     Father               101111103          11                         11                   101
     Mother               101111103          11                         11                   102
     Daughter             101111103          11                         11                   103
     Son-in-Law           101111103          11                         11                   301
     Cousin               101111103          11                         11                   105
     Wave 4               Parent’s Household
     Father               101111103          11                         11                   101
     Mother               101111103          11                         11                   102
                          Daughter’s Household
     Daughter             101111103          41                         11                   103
     Son-in-Law           101111103          41                         11                   301
                          Cousin’s Household
     Cousin               101111103          42                         11                   105
     Uncle                101111103          42                         42                   401
     Wave 10              Parent’s Household
     Father               101111103          11                         11                   101
     Mother               101111103          11                         11                   102
                          Daughter’s Household
     Daughter             101111103          41                         11                   103
     Son-in-Law           101111103          41                         11                   301
     Newborn              101111103          41                         41                  1001


     address 1 of Wave 1, and the entry address number for each member of the household is the
     same as the current address number. Because they are assigned in Wave 1, the person
     numbers are in the 100 series and are numbered sequentially, beginning with 101.
!    During Wave 2, the son joins the Army, moves into military barracks, and therefore leaves
     the SIPP sample.29 The son’s record, person number 104, will contain information (either

29
  Members of the armed forces are included in the SIPP sample only if they are living state-side in private housing.
Those living overseas or in military barracks are not included in the SIPP sample universe.


                                                      12-25
SIPP USERS’ GUIDE

     imputed or provided by proxy) on his characteristics for the time in Wave 2 that he was still
     in the sample. If he does not return to the sample during the remainder of the panel, there will
     be no records for him beyond Wave 2.
!    During Wave 3, the daughter marries and her husband moves into the household. The current
     address number where the mother, father, cousin, daughter, and son-in-law live remains the
     same because it is the same address. The son-in-law’s entry address number is 11 because he
     first enters the SIPP sample at an address coded 11. The person number for the son-in-law is
     in the 300 series (301) because he joins the SIPP sample in Wave 3.
!    During Wave 4, the daughter and son-in-law move into a new house. Their current address
     number changes to 41 to indicate that a new address has been established in Wave 4.
     Meanwhile, the cousin, who is over age 15, moves in with an uncle.30 The cousin’s current
     address number changes to 42 (i.e., the second household added into the SIPP sample in the
     fourth wave). The assignment of address number 41 to the daughter and 42 to the cousin is
     random. It could be the other way around. The uncle enters the SIPP sample and receives an
     address number of 42 and an entry address number of 42. The uncle’s person number is in
     the 400 series (401) since he joins the survey in Wave 4.
!    No changes in household composition are observed during Waves 5–9.
!    During Wave 10, the daughter and son-in-law have a baby. This new sample member is
     assigned the sample unit ID of the daughter and son-in-law. The newborn’s entry address is
     41, since that is the current address ID of the daughter and son-in-law at the time of birth.
     The newborn’s person number is 1001, reflecting the fact that the newborn came into the
     SIPP sample in Wave 10. Meanwhile, the cousin moves to Europe and therefore leaves the
     SIPP sample. The uncle, even though he did not move to Europe with the cousin, also leaves
     the SIPP sample because he no longer resides with an original SIPP sample member. Their
     records are no longer listed.
Table 12-18 displays this example again, but this table depicts how the HH-ADDIDi variable
changes over time to reflect the household composition changes. The table also illustrates the
structure of the full panel data files.

There are two extremely rare occasions in which the original PP-ID, PP-ENTRY, and PP-PNUM
values are modified:

1. The first occasion is when two separate sampling units, each containing original sample
   members, are merged, perhaps because of a marriage. In this situation, one of the original set
   of PP-ID and PP-ENTRY values is retained and the other set is changed to agree with the
   retained set. The person number values (PP-PNUM) of the changed set are modified further
   to be between 180 and 199, inclusive.


30
  In the 1993 Panel, all original sample members were followed, no matter what their ages. In all other panels, only
people 15 years of age or older were followed when they moved to new addresses.


                                                      12-26
                    Table 12-18. Household Changes and Their Effects on the Household ID (HH-ADDIDi) Variable in the


                                                                                                                                                       USING THE 1990-1993 FULL PANEL LONGITUDINAL RESEARCH FILES
                                                       Longitudinal Research File

                                                                                                  HH-ADDIDi
                                                        Wave 1                Wave 2               Wave 3            Wave 4             Wave 5
                      PP-      PP-                      Month                 Month                 Month            Month              Month
        PP-ID         ENTRY    PNUM    Notes      1     2   3     4     5     6   7     8     9    10 11 12     13   14 15    16   17   18 19     20
        101111103     11        101    Father     11    11 11     11    11    11 11     11    11 11 11 11       11   11 11    11   11   11 11     11
        101111103     11        102    Mother     11    11 11     11    11    11 11     11    11 11 11 11       11   11 11    11   11   11 11     11
        101111103     11        103    Daughter   11    11 11     11    11    11 11     11    11 11 11 11       41   41 41    41   41   41 41     41
        101111103     11        104    Son        11    11 11     11    11      0   0     0     0   0   0   0    0    0   0    0    0    0   0     0
        101111103     11        105    Cousin     11    11 11     11    11    11 11     11    11 11 11 11       11   42 42    42   42   42 42     42
        101111103     11        301    Son/law      0     0   0     0     0     0   0     0     0 11 11 11      41   41 41    41   41   41 41     41
        101111103     42        401    Uncle        0     0   0     0     0     0   0     0     0   0   0   0   42   42 42    42   42   42 42     42
12-27


        101111103     41       1001    Newborn     0     0   0     0     0     0   0     0     0    0   0   0    0    0   0    0    0    0   0     0

                                                                                                 HH-ADDIDi
                                                        Wave 6                Wave 7              Wave 8             Wave 9             Wave 10
                      PP-      PP-                      Month                 Month                Month             Month              Month
        PP-ID         ENTRY    PNUM    Notes      21    22 23     24    25    26 27     28    29 30 31 32       33   34 35    36   37   38 39     40
        101111103     11        101    Father     11    11 11     11    11    11 11     11    11 11 11 11       11   11 11    11   11   11 11     11
        101111103     11        102    Mother     11    11 11     11    11    11 11     11    11 11 11 11       11   11 11    11   11   11 11     11
        101111103     11        103    Daughter   41    41 41     41    41    41 41     41    41 41 41 41       41   41 41    41   41   41 41     41
        101111103     11        104    Son         0     0   0     0     0     0   0     0     0   0   0   0     0    0   0    0    0    0   0     0
        101111103     11        105    Cousin     42    42 42     42    42    42 42     42    42 42 42 42       42   42 42    42    0    0   0     0
        101111103     11        301    Son/law    41    41 41     41    41    41 41     41    41 41 41 41       41   41 41    41   41   41 41     41
        101111103     42        401    Uncle      42    42 42     42    42    42 42     42    42 42 42 42       42   42 42     0    0    0   0     0
        101111103     41       1001    Newborn     0     0   0     0     0     0   0     0     0   0   0   0     0    0   0    0   41   41 41     41
SIPP USERS’ GUIDE

2. The second occasion is when a household splits into two new households (in which each new
   household gains a new sample person) and later the households recombine. For example,
   assume that a married couple separate in Wave 3, each moving in with a sibling. Both
   siblings are assigned a person number of 301, because they entered the sample in Wave 3 at
   different addresses (thus, HH-ADDIDi = 31 and 32). If the husband and wife reunite in
   Wave 6, and bring the siblings with them, one sibling’s person number would be changed. In
   this case, one of the siblings would have a person number of 301 and the other would have a
   person number of 680 (or some number between 680 and 699, inclusive).
Because a record in the longitudinal research file describes the person throughout the entire panel
and because the sample unit ID (PP-ID) cannot change on this record, each person in a merged
household whose ID values were changed is assigned two full panel records. The first record
contains the original ID information of the person before the merge and identifies the person as
having exited the sample at the time of the merge. The second record contains the new ID
information and identifies the person as having entered the sample at the time of the merge.
There is no way to link the two records in the longitudinal research files.31


Identifying Program Units
Besides household and family composition data, the longitudinal research files contain detailed
information about participation in health insurance and various government transfer programs.
For most programs, three characteristics are recorded (Table 12-19):

1. Whether the person is covered;
2. Who received the income or benefit; and
3. The amount of the income or benefit.
The coverage variables identify whether the income or benefit covers that person in month i. In
other words, when a person is flagged as covered by food stamps (FOODSTMPi = 1), the person
either received the benefits directly (because he or she was the authorized food stamp recipient)
or indirectly (because he or she was in the same program unit as the authorized recipient). The
coverage variables also allow users to determine each person’s membership in each program
unit. That is useful because program units often exclude some members of the family or
household.32 Also, as with households and families, membership in program units can change
from one month to the next. For that reason, program unit membership and characteristics of the
unit should be evaluated for each month.


31
  If needed, this information can be merged from the core wave files. Chapters 10 and 13 provide details.
32
   In the 1984 and 1985 Panels, coverage for the Women, Infants, and Children (WIC) nutrition program was
imputed to children under 6 years old if their mother reported participation in the WIC program. Beginning with the
1986 Panel, WIC coverage has been assessed directly for all sample members.


                                                     12-28
              USING THE 1990–1993 FULL PANEL LONGITUDINAL RESEARCH FILES

     Table 12-19. Variables Describing Participation in Government Transfer Programs and
           Health Insurance Programs in the 1990–1993 Longitudinal Research Files

                                                                     G1
                                                  Authorized         Source
Program                       Coverage            Recipient          Code         Amount
Social Security               SOC-SEC             SS-PIDX             1           Locate one of the amount
Railroad Retirement           RAILROAD            RR-PIDX             2           variables: G1AMT1–
Federal Supplemental          —                   —                   3           G1AMT10, using the
Security Income                                                                   corresponding source
Veteran’s Benefits            VETS                VA-PIDX             8           variables: G1SRC1–G1SRC10
Aid to Families with          AFDC                AFDCPIDX           20
Dependent Children
General Assistance            GEN-ASST            GA-PIDX            21
Foster Child Care             FOST-KID            FOSTPIDX           23
Other Welfare                 OTH-WELF            OTH-PIDX           24
WIC Benefits                  WICCOV              WIC-PIDX           25
Food Stamps                   FOODSTMP            FS-PIDX            27
Medicare                      CARECOV             —                  —
Medicaid                      CAIDCOV             —                  —
CHAMPUS                       CHAMP               —                  —


The authorized recipient variables identify the people who actually received the income or
benefit for the people in their program units. In the longitudinal research files, those variables do
not use the entry address and person number values. Instead, they use the sequence number of
the person within the sample unit (PP-RCSEQ) to identify authorized recipients. In other words,
the authorized food stamp recipient is the person for whom FS-PIDXi in month i equals
PP-RCSEQ.

Individuals who are members of a common program unit in a given month (i) can be identified
by using the sample unit ID (PP-ID), the person’s interview status in month i (PP-MISi), and the
authorized recipient variable in month i. For example, members of a common food stamp unit in
month i are those with PP-MISi of 1 and common values of PP-ID (a value that does not change
from month to month) and FS-PIDXi (a value that does change from one month to the next). The
SIPP longitudinal research files do not include authorized recipient variables for Medicare and
SSI programs.33

There are some exceptions to the rules:

!     Social Security, Railroad Retirement, WIC, and AFDC can offer benefits solely to children.
      When that happens, an adult will receive the income on behalf of the children. The adult,
      therefore, is flagged as the authorized recipient and the income amounts appear on the record
      of the adult. The adult authorized recipient, however, is not flagged as being covered by the
      program. The children are flagged as covered.

33
  In effect, each person covered by these two programs is an authorized recipient, and the program units are the
people themselves.


                                                    12-29
SIPP USERS’ GUIDE

!    Most SSI recipients are elderly and disabled adults, but they can also be children with
     disabilities.34 Even so, the SSI amount is recorded on an adult’s record, not on the child’s
     record. Unlike the core wave files, the longitudinal research files have no coverage variable
     indicating whether or not the child, adult, or both, were covered. If needed, this information
     can be merged from the core wave files. Chapter 13 provides a detailed discussion of
     merging SIPP files.
!    The medical insurance variables simply reflect who is enrolled in which type of program.
     There are no associated amount variables.
These rules and exceptions are illustrated in Table 12-20. The household contains one AFDC
unit and two food stamp units. The mother is covered by Social Security and SSI. The mother of
the (disabled) child receives SSI on behalf of her child. The grandchild receives WIC. Everyone
in the household is enrolled in Medicaid. The coverage variables are set to 2 whenever the
person is not covered by the particular program. The indicators for the authorized recipients do
not use the PP-ENTRY and PP-PNUM values. Instead, they are based on the “line number” of
the authorized recipient on the household roster. That is very different from the indicators used
on the core wave files.


Using the Unearned Income Variables
To save space, the Census Bureau organizes the unearned income variables differently in the
longitudinal research files than in the core wave files. As shown in Table 12-21, 10 variables on
each person’s record identify up to 10 different sources of unearned income
(G1SRC1–G1SRC10). For each source identified, there is a corresponding amount variable
(G1AMT1i–G1AMT10i). Income amounts are recorded with monthly resolution. The person in
Table 12-21 periodically receives $500 in federal SSI and $125 in food stamps. The person does
not receive any other source of unearned income.

When using these fields, analysts often find it helpful to realign the unearned income into new
income-specific variables.35


34
   In the 1990s, the definition of qualifying disabling conditions was expanded. That change in definition resulted in
a rapid expansion of the child SSI caseload.
35
   For example, Table 12-22 includes monthly variables for SSI and food stamps that were created by using the
algorithm in Figure 12-5.


                                                       12-30
                 USING THE 1990–1993 FULL PANEL LONGITUDINAL RESEARCH FILES

              Table 12-20. Example of Program Units, Coverage, and Benefit Amounts
                               in the Longitudinal Research Files

                                                          Daughter #1’s                      Spouse of
       Variable           Mother        Daughter #1       Son               Daughter #2      Daughter #2
       PP-PNUM            101           102               103               104              105
       PP-RCSEQ              1            2                   3               4                5
       AGEi                70            21                   4              25               26
       AFDC
       AFDCi                 2            1                   1               2                2
       AFDCPIDXi             0            2                   2               0                0
       Food Stamps
       FOODSTMPi             2            1                   1               1                1
       FS-PIDXi              0            2                   2               4                4
       SSI
       This only appears in the General Amounts (G1) section.
       WIC
       WICCOVi               2            2                   1               2                2
       WIC-PIDXi             0            2                   2               0                0
       Medicaid
       CAIDCOVi              1            1                   1               1                1
       Social Security
       SOC-SECi              1            2                   2               2                2
       General (G1) Sources and Amounts
       G1SRC1                  3          20                  0               27               0
       G1AMT1i ($)          188          123                  0              130               0
       G1SRC2                  1          27                  0                0               0
       G1AMT2i ($)          470          160                  0                0               0
       G1SRC3                  0           3                  0                0               0
       G1AMT3i ($)             0         122                  0                0               0
       G1SRC4                  0          25                  0                0               0
       G1AMT4i ($)             0          30.12               0                0               0
     a
       These codes are explained in the next section of text.


Income Topcoding
The Census Bureau topcodes each income variable to protect against the possibility that a user
might identify a SIPP respondent with very high income.36 While the data dictionary indicates a
topcode of $33,332 for monthly income, that is also the income topcode for the wave. That
topcode is, therefore, rarely used for a month. In most cases, the monthly income is topcoded at
$8,333, which actually represents $8,333 or more. Individual amounts above $8,333 may
occasionally be shown if the respondent’s income varied considerably from month to month


36
  New topcoding procedures are being implemented with the 1996 Panel. When a longitudinal research file for the
1996 Panel is available, this discussion will be revised to describe those new procedures. At present, users should
note that this description does not pertain to the core wave files from the 1996 Panel.


                                                        12-31
                                                                                                                                                    SIPP USERS’ GUIDE
                                         Table 12-21. Unearned Income in the Longitudinal Research Files

                                                                                           PP-MIS
                                     Wave 1                      Wave 2                    Wave 3                 Wave 4              Wave 5
                                     Month                       Month                     Month                  Month               Month
        Variable           1       2    3      4       5       6    7     8       9       10   11     12    13   14  15    16   17   18  19    20
        PP-ID       7887
        PP-PNUM      102
        PP-MIS                 1     1     1       1       1    1     1       1       1     1     1     1    2    2    2    2    0    0    0    0
        G1SRC1         3
        G1AMT1 ($)         500     500   500   500         0    0     0   500     500     500   500   500    0    0    0    0    0    0    0    0
        G1SRC2        27
        G1AMT2 ($)             0     0     0       0       0    0     0   125     125     125   125     0    0    0    0    0    0    0    0    0
        G1SRC3         0
12-32


        G1AMT3 ($)             0     0     0       0       0    0     0       0       0     0     0     0    0    0    0    0    0    0    0    0
        G1SRC4         0
        G1AMT4 ($)             0     0     0       0       0    0     0       0       0     0     0     0    0    0    0    0    0    0    0    0
        G1SRC5         0
        G1AMT5 ($)             0     0     0       0       0    0     0       0       0     0     0     0    0    0    0    0    0    0    0    0
        G1SRC6         0
        G1AMT6 ($)             0     0     0       0       0    0     0       0       0     0     0     0    0    0    0    0    0    0    0    0
        G1SRC7         0
        G1AMT7 ($)             0     0     0       0       0    0     0       0       0     0     0     0    0    0    0    0    0    0    0    0
        G1SRC8         0
        G1AMT8 ($)             0     0     0       0       0    0     0       0       0     0     0     0    0    0    0    0    0    0    0    0
        G1SRC9         0
        G1AMT9 ($)             0     0     0       0       0    0     0       0       0     0     0     0    0    0    0    0    0    0    0    0
        G1SRC10        0
        G1AMT10 ($)            0     0     0       0       0    0     0       0       0     0     0     0    0    0    0    0    0    0    0    0
                                    Table 12-21. Unearned Income in the Longitudinal Research Files (continued)


                                                                                                                                   USING THE 1990–1993 FULL PANEL LONGITUDINAL RESEARCH FILES
                                                                                                                                   USING THE 1990–1993 FULL PANEL LONGITUDINAL RESEARCH FILES
                                                                                                                                   USING THE 1990–1993 FULL PANEL LONGITUDINAL RESEARCH FILES
                                                                            PP-MIS
                              Wave 6           Wave 7             Wave 8           Wave 9                   Wave 10
                              Month            Month              Month            Month                    Month
        Variable           21  22    23     24  25    26    27   28  29    30   31  32    33   34   35   36  37     38   29   40
        PP-ID       7887
        PP-PNUM      102
        PP-MIS             0    0      0    0    0    0     0    0    0    0    0    0    0    0    0    0    0    0     0    0
        G1SRC1         3
        G1AMT1 ($)         0    0      0    0    0    0     0    0    0    0    0    0    0    0    0    0    0    0     0    0
        G1SRC2        27
        G1AMT2 ($)         0    0      0    0    0    0     0    0    0    0    0    0    0    0    0    0    0    0     0    0
        G1SRC3         0
12-33


        G1AMT3 ($)         0    0      0    0    0    0     0    0    0    0    0    0    0    0    0    0    0    0     0    0
        G1SRC4         0
        G1AMT4 ($)         0    0      0    0    0    0     0    0    0    0    0    0    0    0    0    0    0    0     0    0
        G1SRC5         0
        G1AMT5 ($)         0    0      0    0    0    0     0    0    0    0    0    0    0    0    0    0    0    0     0    0
        G1SRC6         0
        G1AMT6 ($)         0    0      0    0    0    0     0    0    0    0    0    0    0    0    0    0    0    0     0    0
        G1SRC7         0
        G1AMT7 ($)         0    0      0    0    0    0     0    0    0    0    0    0    0    0    0    0    0    0     0    0
        G1SRC8         0
        G1AMT8 ($)         0    0      0    0    0    0     0    0    0    0    0    0    0    0    0    0    0    0     0    0
        G1SRC9         0
        G1AMT9 ($)         0    0      0    0    0    0     0    0    0    0    0    0    0    0    0    0    0    0     0    0
        G1SRC10        0
        G1AMT10 ($)        0    0      0    0    0    0     0    0    0    0    0    0    0    0    0    0    0    0     0    0
                                                                                                                                                              SIPP USERS’ GUIDE
                                       Table 12-22. User-Created SSI and FSP Variables Using the Unearned
                                               Income Variables in the Longitudinal Research Files

                                                                                            PP-MIS
                                          Wave 1                   Wave 2                   Wave 3                   Wave 4                  Wave 5
                                          Month                    Month                     Month                    Month                   Month
        Variable                     1    2    3       4     5     6    7       8     9     10   11      12    13    14   15     16    17    18   19     20
        PP-ID           7887
        PP-PNUM          102
        PP-MIS                       1    1     1      1 1       1     1        1     1      1     1     1     2     2     2     2     0     0     0     0
        G1SRC1              3
        G1AMT1 ($)                500 500 500 500 0              0     0     500 500 500          500   500    0     0     0     0     0     0     0     0
        G1SRC2             27
        G1AMT2 ($)                   0    0     0      0 0       0     0     125 125 125          125    0     0     0     0     0     0     0     0     0
        G1SRC3              0
        G1AMT3 ($)                   0    0     0      0 0       0     0        0     0      0     0     0     0     0     0     0     0     0     0     0
12-34


        G1SRC4              0
        G1AMT4 ($)                   0    0     0      0 0       0     0        0     0      0     0     0     0     0     0     0     0     0     0     0
        G1SRC5              0
        G1AMT5 ($)                   0    0     0      0 0       0     0        0     0      0     0     0     0     0     0     0     0     0     0     0
        G1SRC6              0
        G1AMT6 ($)                   0    0     0      0 0       0     0        0     0      0     0     0     0     0     0     0     0     0     0     0
        G1SRC7              0
        G1AMT7 ($)                   0    0     0      0 0       0     0        0     0      0     0     0     0     0     0     0     0     0     0     0
        G1SRC8              0
        G1AMT8 ($)                   0    0     0      0 0       0     0        0     0      0     0     0     0     0     0     0     0     0     0     0
        G1SRC9              0
        G1AMT9 ($)                   0    0     0      0 0       0     0        0     0      0     0     0     0     0     0     0     0     0     0     0
        G1SRC10             0
        G1AMT10 ($)                  0    0     0      0 0       0     0        0     0      0      0     0     0     0     0     0     0     0     0     0
        SSI         ($)           500 500 500 500 0 a 0                0     500 500 500          500   500   –99   –99   –99   –99   –99   –99   –99   –99
        FSP         ($)              0    0     0      0 0       0     0     125 125 125          125     0   –99   –99   –99   –99   –99   –99   –99   –99
        a
          In SAS, the unassigned values would have a “system missing” value displayed as a “.”.
                                  Table 12-22. User-Created SSI and FSP Variables Using the Unearned
                                     Income Variables in the Longitudinal Research File (continued)

                                                                                   PP-MIS
                                    Wave 6                  Wave 7                  Wave 8                  Wave 9                 Wave 10


                                                                                                                                                    USING THE 1990–1993 FULL PANEL LONGITUDINAL RESEARCH FILES
                                                                                                                                                    USING THE 1990–1993 FULL PANEL LONGITUDINAL RESEARCH FILES
                                                                                                                                                    USING THE 1990–1993 FULL PANEL LONGITUDINAL RESEARCH FILES
                                    Month                   Month                   Month                   Month                    Month
        Variable            21    22   23     24    25    26   27     28    29    30   31     32    33    34   35     36    37    38   39  40
        PP-ID        7887
        PP-PNUM       102
        PP-MIS               0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
        G1SRC1          3
        G1AMT1 ($)           0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
        G1SRC2         27
        G1AMT2 ($)           0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
        G1SRC3          0
        G1AMT3 ($)           0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
12-35


        G1SRC4          0
        G1AMT4 ($)           0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
        G1SRC5          0
        G1AMT5 ($)           0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
        G1SRC6          0
        G1AMT6 ($)           0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
        G1SRC7          0
        G1AMT7 ($)           0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
        G1SRC8          0
        G1AMT8 ($)           0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
        G1SRC9          0
        G1AMT9 ($)           0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
        G1SRC10         0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
        G1AMT10 ($)
        SSI      ($)        –99   –99   –99   –99   –99   –99   –99   –99   –99   –99   –99   –99   –99   –99   –99   –99   –99   –99   –99   –99
        FSP      ($)        –99   –99   –99   –99   –99   –99   –99   –99   –99   –99   –99   –99   –99   –99   –99   –99   –99   –99   –99   –99
SIPP USERS’ GUIDE

Figure 12-5. Creating Monthly Food Stamp and SSI Income Variables from the Unearned
                  Income Variables in the Longitudinal Research Files

For each person:
      /*
            This step is not needed in SAS
      */
      For each month (index = mo):
            If PP-MIS (mo) = 1 Then do
                  SSI(mo) = 0
                  FSP(mo) = 0
            End If PP-MIS (mo) = 1
            Else do
                  SSI(mo) = -99
                  FSP(mo) = -99
            End Else
      End month loop
      /*
            Begin here for SAS
      */
      For each G1SRC (index=i):
            If G1SRC(i)=3 Then do
                  For each month (index=mo)
                        If PP-MIS (mo) = 1 Then do SSI(mo)=G1AMT(i,mo)
                        End If PP-MIS (mo) = 1
                  End month loop
            End If G1SRC(i)=3
            Else if G1SRC(i)=27 Then do
                  For each month (index=mo)
                        If PP-MIS (mo) = 1 Then do FSP(mo)=G1AMT(i,mo)
                        End If PP-MIS (mo) = 1
                  End month loop
            End if G1SRC(i)=27
      End G1SRC loop


within a wave. For example, if a respondent’s income from a single job was concentrated in only
one of the four reference months, a figure as high as $33,332 could be shown.

Summary income variables on the person, family, and household records are simply the sums of
the component variables after they have been topcoded. The summary variables are not
independently topcoded. Thus, a person with high income from several sources (multiple jobs,
businesses, property) could have aggregate monthly income well over the topcode for each
source, and yet the data could still be greatly understating the person’s true income.

As shown in Table 12-23, person 101 has wages topcoded. The person received considerably
more money in December than in the other months. Also, total family income and total
household income are the sum of the income amounts (in this case, WS-ERN-AMT1i +
G1AMT1i) after they have been topcoded.


                                            12-36
              USING THE 1990–1993 FULL PANEL LONGITUDINAL RESEARCH FILES

            Table 12-23. Example of Topcoding in the Longitudinal Research Files

Person                                Household          Family Total       Wages              Child Support
Number             Calendar           Total Income       Income             (WS-ERN-           Payments
(PP-PNUM)          Month              (HH-INCi)          (FF-INCi)          AMT1i)             (G1AMT1i)
101               10                 $ 9,333            $ 9,333            $ 8,333            $1,000
101               11                 $ 9,333            $ 9,333            $ 8,333            $1,000
                                                                                     a
101               12                 $13,123            $13,123            $12,123            $1,000
101                 01                $ 5,793           $ 5,793            $ 4,543             $1,250
a
  This figure can exceed the nominal monthly topcode of $8,333 because the person’s total earnings for the wave
were below $33,332.


Using Allocation (Imputation) Flags
As described in Chapter 4, the Census Bureau often imputes information when a person does not
respond to the survey or to a particular question. Two sources identify whether information has
been imputed:

1. Beginning with the 1991 Panel, all data for a wave are imputed if a person was not
   successfully interviewed in one wave but had complete information (from either a successful
   interview or a proxy interview) in the two adjacent waves. In those cases, the value of
   WAVFLG will be greater than zero and INTVW will be 3 or 4.
2. A variable of interest may be imputed. In the longitudinal research files, allocation
   (imputation) flags are included for the earned income, asset income, and unearned (transfer)
   income variables.
Other variables are also subject to editing and imputation. The edit and imputation procedures
used for the longitudinal research files differ from those used for the core wave files. The
procedures used for the longitudinal research files make use of the full set of longitudinal data
for a person. Because the core wave files are processed individually, the edit and imputation
procedures applied to those files have, at most, 4 months of observations for a person. The
procedures applied to the core wave files make greater use of cross-observation imputation
methods than do those applied to the longitudinal research files.37


Using Weights
The full panel longitudinal research files include the calendar year weights (FNLWGTs) and the
full panel weight (PNLWGT). The number of calendar year weights depends on the duration of

37
   The edit and imputation procedures applied to the core wave files from the 1996 Panel make greater use of
retrospective information than procedures used in earlier panels. See Chapters 4 and 10 for details.


                                                   12-37
SIPP USERS’ GUIDE

the panel; the number varies from one calendar year weight for the 1989 Panel to three calendar
year weights for the 1993 Panel. When the 1996 full panel file is available, it will have four
calendar year weights.

The source and accuracy statements that accompany all SIPP full panel files ordered from the
Census Bureau provide suggestions on how to use the weight variables in those files. Also,
Chapter 8 of this Guide contains a full discussion of how to use weights in full panel files.


Identifying States
The longitudinal research file contains a variable (GEO-STE) that identifies 41 individual states
and the District of Columbia; the nine other states are suppressed into three groups:

1   Maine, Vermont;
2. Iowa, North Dakota, South Dakota; and
3. Alaska, Idaho, Montana, Wyoming.
Even though it is possible to identify most states, the SIPP sample was not designed to be
representative at the state level and should not be used to produce direct state-level estimates.
The state variable is included on the public use files to allow examination of how state-level
characteristics affect national estimates. For example, a user could apply the state-specific
eligibility criteria for a means-tested program in order to arrive at a national estimate of the
number of people eligible for the program. Because some states are not uniquely identified, some
method of allocating the state-specific eligibility rules to sample persons in those states would
need to be devised.


Identifying Metropolitan Areas
The longitudinal research files do not contain any variables identifying metropolitan areas.
Analysts who need this information should merge it from the core wave files. Chapter 11
provides details about how to use the variables identifying metropolitan areas. Chapter 13
provides instructions for merging data from multiple SIPP public use files.


                                             12-38
13. Linking Core Wave, Topical
    Module, and Longitudinal
    Research Files
In many situations, a single Survey of Income and Program Participation (SIPP) data file will not
contain the information needed for a project. Because only limited core information is included
on the topical module files, analysts often need to merge data from the core wave or longitudinal
research files with topical module information. Also, they may need to link two or more topical
module files, each containing data on a different topic and collected in different waves. And
there are situations in which it is necessary to merge data from the core wave files with data from
the longitudinal research files. Those situations arise because not all of the core wave content is
included on the longitudinal research files (e.g., calendar month weights are only on the core
wave files).1 This chapter describes procedures for linking core wave, topical module, and full
panel data files.

This chapter assumes a working knowledge of the files that will be linked.2 Analysts who are not
familiar with those files should read the following before proceeding with this chapter:

!   Chapter 9 for an overview of the SIPP data files;
!   Chapter 10 for a discussion of the core wave files;
!   Chapter 11 for a discussion of the topical module files; and
!   Chapter 12 for a discussion of the longitudinal research files.
In all cases, this chapter describes procedures for linking person records across files. It does not
discuss procedures for linking households or families because those procedures become
problematic when working with longitudinal data.3
1
  Even when the same variables are on both the core wave and longitudinal research files, the data may not be the
same. Different edit and imputation procedures are used for these two types of files. Prior to the 1996 Panel, all edit
and imputation procedures applied to the core wave files worked entirely within the given file. Information from
previous waves or later waves was not used. Beginning with the 1996 Panel, edit and imputation procedures applied
to the core wave files make greater use of information from previous waves. However, because the core wave files
are processed as the data become available, it is not possible to make use of information from future waves. The edit
and imputation procedures applied to the longitudinal research files, however, make use of each person’s full
longitudinal record. There are many times when the preferred data for a study will be on the longitudinal research
files but the weights will be on the core wave files.
2
  This chapter does not discuss the longitudinal research file from the 1996 Panel because, as of this writing, it is not
available. That information will be added to an updated version of this chapter once the file becomes available. In
the interim, the only information included in this chapter on the 1996 longitudinal research file is the new variable
names being used in the 1996 Panel data files.
3
  Difficulties arise when unit composition changes over time. In those situations, there is no unambiguous way to
define longitudinal households and families, and many ad hoc procedures run the risk of introducing biases into

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in
parentheses following 1996 variable names.
                                                         13-1
SIPP USERS’ GUIDE

This chapter begins with a discussion of the mechanics involved in linking SIPP data files. The
procedures are straightforward and easily implemented. In each case there are three basic steps:

1. Create data extracts from each of the files to be linked;
2. Sort the files in common order by using the variables identified as match keys; and
3. Merge the files.
There are two general formats that the final files can take. This chapter refers to these as person-
month format (the format of the current core wave files) and person-record format (the format of
the longitudinal research files).4 The choice of format will be a function of the planned analysis
and the software that will be used for that analysis. Where appropriate, procedures for generating
each type of data file are described.

After discussing the mechanics of linking SIPP files, this chapter discusses why nonmatches
occur and suggests ways to deal with them.

For the 1996 Panel, most variable names changed from those of previous panels. To aid users
working with pre-1996 panel files, this chapter presents both the old and the new variable names
when the text applies to both. In the main body of the text, the old names are presented in
parentheses following the new names. For example, the sample unit ID variable name, which is
SSUID in the 1996 Panel, was SUID in previous panels; it is written in this chapter as SSUID
(SUID). In tables, a variety of methods are used to present both the old and the new names.


Procedures for Linking Files
There are six types of merges that SIPP users commonly need to perform:

1. Person-month records within a core wave file can be linked, creating a single wide record for
   each person rather than a record for each person for each month;5
2. Two or more core wave files can be linked together;
3. Core wave files can be linked to longitudional research files;


analyses of those units. The alternative approach that has gained acceptance in the research community involves
assigning to people the characteristics of the households or families to which they belong at each point in time.
Subjects can then be followed over time, as can the characteristics of the households or families to which they
belong. One exception to the longitudinal household problem is with program units (e.g., food stamp units), where
program rules can be used to define when changing composition constitutes the formation of a new unit (as opposed
to changed composition of an existing unit). For discussions of the issues involved in studying longitudinal
households and families, see McMillen and Herriot (1985), Duncan and Hill (1985), Citro et al. (1986), and Kalton
et al. (1987).
4
  Some software (e.g., Stata) refers to this as “wide” format, while the person-month format is referred to as “long.”
5
  This procedure transforms the current format of the core wave files into a format similar to that used prior to the
1990 Panel, a format analogous to that used for the longitudinal research files.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in
parentheses following 1996 variable names.
                                                       13-2
                                                                                     LINKING SIPP FILES

4. Two or more topical module files can be linked to each other;
5. Topical module files can be linked to core wave files; and
6. Topical module files can be linked to longitudinal research files.
This chapter addresses each of these merges in turn.


Linking Within a Core Wave File—Transforming the
Person-Month Format into the Person-Record Format

This procedure transforms the person-month-format core wave files (with one record per person
per month) into a single wide record per person (the format used for the core wave files before
the 1990 Panel). As well as being useful in its own right, reformatting is often a necessary first
step when merging core wave files with data from either the topical module files or from the
longitudinal research files.

Two approaches for this link are described. Programmers using third-generation languages, such
as FORTRAN and PL/1, typically use the first approach. Programmers using fourth-generation
languages, such as SAS and SPSS, typically use the second approach.

The first approach (using FORTRAN) contains four steps:

1. Sort the file by person and reference month, using the following variables: sample unit ID
   [SSUID (SUID)], entry address ID [EENTAID (ENTRY)], person number [EPPPNUM
   (PNUM)], and reference month [SREFMON (REFMTH)].6 This is the sort order the Census
   Bureau uses for the core wave files. If the file being used is in its original sort order, this step
   can be skipped.
2. Define and initialize monthly variable arrays to some “missing data” code. Users should be
   careful to choose initial values outside the range of legal values for the variables of interest.
   For example, the variable TAGE (AGE) would be defined as an array of four elements, and
   each element could be initialized to –9 (an age that no one can have); the variable
   TPTOTINC (TOTINC) would be defined as an array of four elements and each element
   could be initialized to –999999 (a negative value outside the range of the variable), and so
   on.
3. Read each person’s corresponding person-month record and put the information into the
   appropriate element of the array.
4. Write the person-based record from the information stored in the arrays.
The second approach (using SAS) also contains four steps:7
6
  In the 1996 Panel, the entry address is no longer needed to uniquely identify people. Its continued use will not
create any problems; it is simply redundant information for purposes of identifying SIPP sample members.
7
   An alternative procedure that may be useful in many cases uses SAS Proc Transpose. Stata also has a
procedure—reshape—that can accomplish this task.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in
parentheses following 1996 variable names.
                                                      13-3
SIPP USERS’ GUIDE

1. Sort the file by person and reference month, using the following variables: sample unit ID
   [SSUID (SUID)], entry address ID [EENTAID (ENTRY)], person number [EPPPNUM
   (PNUM)], and reference month [SREFMON (REFMTH)]. This is the sort order used by the
   Census Bureau for the core wave files. If the file being used is in its original sort order, this
   step can be skipped.
2. Write out four files, each one containing the person ID variables and the variables for 1 of the
   4 months. For example, file1 would have the person ID variables [SSUID (SUID),
   EENTAID (ENTRY), and EPPPNUM (PNUM)] and the variables for month one, file2
   would have the person ID variables and the variables for month two, and so on.
3. Rename the (monthly) variables in each of the four files to unique names. For example, the
   variable names in file1 might be TAGE1 (AGE1) and PTOTINC18 (TOTINC1); in file2 the
   variable names might be TAGE2 (AGE2) and PTOTINC2 (TOTINC2).
4. Merge the four files together, using SSUID (SUID), EENTAID (ENTRY), and EPPPNUM
   (PNUM) as the match keys.
The SAS code in Figure 13-1 performs the above steps.

The person-month format of the core wave files (before reformatting) is illustrated in Table 13-1.
Person number 101 is in the sample all 4 months, person number 102 is in the sample all 4
months, person number 201 is in the sample for 2 months, and person number 202 is in the
sample for 1 month. The person-record format (after reformatting) is illustrated in Table 13-2.
Missing data are indicated by a single period, the default missing data code in SAS. For the
FORTRAN example, the missing data would have codes of –9 and –999999.


Linking Two or More Core Wave Files

There are three reasons to link two or more core wave files:

1. To create an analysis file for one or more calendar months containing data from all four
   rotation groups. For example, data for March 1994 are contained in the Wave 7 file (of the
   1992 Panel) for rotation groups 4 and 1, and in the Wave 8 file for rotation groups 2 and 3.
   (Data for the same calendar month are also in Waves 4 and 5 of the 1993 Panel.)
2. To create an analysis file containing more than 4 months of information for each person. This
   linkage is of primary interest to users of the 1996 Panel, beause longitudinal research files for
   all other panels are available from the Census Bureau.
3. As preparation for merging core wave data with data from either the topical module files or
   the longitudinal research files.


8
 Because variable names in SAS are limited to eight characters, the monthly variable name is shortened from
TPTOTINC1 (nine characters) to PTOTINC1 (eight characters).

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in
parentheses following 1996 variable names.
                                                  13-4
                                                                             LINKING SIPP FILES

    Figure 13-1. Sample SAS Code to Change the Core Wave Files from Person-Month
             Format to Person-Record Format from Wave 2 of the 1996 Panel

  /*
     this creates the initial extract from the full core wave file
  */
  data allmnths;
     set corewv962
        (keep =
           ssuid
           eentaid
           epppnum
           srefmth
           tage
           tptotinc
        );
  run;

  /*
       sort the data – if the master file was in its original order, this
       step is not needed
  */
  proc sort;
     by ssuid eentaid epppnum srefmth;
  run;

  /*
       write out 1 file for each of the four months, renaming variables in
       the process
  */
  data
     file1
        (rename =
           (tage = tage1
             tptotinc = ptotinc1
             srefmth = srefmth1
           )
        )
     file2
        (rename =
           (tage = tage2
             tptotinc = ptotinc2
             srefmth = srefmth2
           )
        )
     file3
        (rename =
           (tage = tage3
             tptotinc = ptotinc3
             srefmth = srefmth3
           )
        )
                                                                                     (figure continues)


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in
parentheses following 1996 variable names.
                                                13-5
SIPP USERS’ GUIDE

     Figure 13-1. Sample SAS Code to Change the Core Wave Files from Person-Month
        Format to Person-Record Format from Wave 2 of the 1996 Panel (continued)

     file4
        (rename =
           (tage = tage4
             tptotinc = ptotinc4
             srefmth = srefmth4
           )
        )
     ;

     set allmnths;

   select (srefmth);
      when (1) output       file1;
      when (2) output       file2;
      when (3) output       file3;
      when (4) output       file4;
   end;
run;

/*
   merge the 4 “monthly” files together, forming the final file
*/
data newfile;
   merge
      file1
      file2
      file3
      file4
      ;
   by ssuid eentaid epppnum;
run;

Creating files in the person-month format is straightforward. In this instance, the files from each
of the contributing core wave files simply need to be sorted and interleaved to create the final
analysis file. The final sort order would likely be based on SSUID (SUID), EENTAID
(ENTRY), EPPPNUM (PNUM), SWAVE (WAVE), and SREFMON (REFMTH).

If a person-record format (with just one record per person) is desired, the first step is interleaving
the files to create the person-month-format file. Then, using that as the input file, analysts can
apply the procedures described in the preceding section to generate a file with a single wide
record for each person. There will be up to 4 months of data for each wave used. In the example
from Tables 13-1 and 13-2, if three waves of data are being combined, the final file will have 12
values for SREFMON (REFMTH), TAGE (AGE), and TPTOTINC (TOTINC). In the SAS
program code, the names would likely be REFMTH1–REFMTH12, TAGE1–TAGE12, and
TOTINC1–TOTINC12.

Users attempting to create their own longitudinal databases from the core wave files should
proceed cautiously. The edit and imputation procedures applied to the core wave files for the


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in
parentheses following 1996 variable names.
                                                13-6
                                                                                        LINKING SIPP FILES

                Table 13-1. Example of the Core Wave Person-Month File Structure

      Sample             Entry              Person           Reference
      Unit ID            Address ID         Number           Month               Age           Total Income
      [SSUID             [(EENTAID          [EPPPNUM         [(SREFMON           [TAGE         [(TPTOTINC
      (SUID)]            (ENTRY)]           (PNUM)]          (REFMTH)]           (AGE)]        (TOTINC)]
      123456781000       011 (11)           0101 (101)       1                   42               $2000
      123456781000       011 (11)           0101 (101)       2                   42               $2100
      123456781000       011 (11)           0101 (101)       3                   42               $2000
      123456781000       011 (11)           0101 (101)       4                   43               $2000
      123456781000       011 (11)           0102 (102)       1                   41               $ 500
      123456781000       011 (11)           0102 (102)       2                   41               $ 500
      123456781000       011 (11)           0102 (102)       3                   41               $    0
      123456781000       011 (11)           0102 (102)       4                   41               $    0
      123456781000       011 (11)           0201 (201)       2                   18               $ 200
      123456781000       011 (11)           0201 (201)       3                   18               $ 200
      123456781000       011 (11)           0201 (201)       4                   18               $ 200
      123456781000       011 (11)           0202 (202)       2                    2               $    0
      123456781000       011 (11)           0202 (202)       3                    2               $    0
      123456781000       011 (11)           0202 (202)       4                    2               $    0

            Table 13-2. Example of the Core-Wave Wide-Record/Person File Structure
              (After Applying the Program in Figure 13-1 to the Data in Table 13-1)

Sample             Entry      Person          Reference
Unit ID            Address ID Number           Month            Age                              Total Income
[SSUID             [EENTAID [EPPPNUM        (SREFMTH)a       (TAGE)b                              (PTOTINC)c
(SUID)]            (ENTRY)] (PNUM)]       1    2    3   4 1   2     3 4  1                          2      3        4
123456781000 011 (11)         0101 (101)  1    2    3   4 42 42 42 43 $ 2000                     $ 2100 $ 2000   $ 2000
123456781000 011 (11)         0102 (102)  1    2    3   4 41 41 41 41 $ 500                      $ 500 $     0   $    0
123456781000 011 (11)         0201 (201)  .    2    3   4  . 18 18 18    .                       $ 200 $ 200     $ 200
123456781000 011 (11)         0202 (202)  .    2    3   4  .  2     2 2  .                       $    0 $    0   $    0
Note: . = missing.
a
  1 = SREFMTH1, 2 = SREFMTH2, 3 = SREFMTH3, 4 = SREFMTH4.
b
  1 = TAGE1, 2 = TAGE2, 3 = TAGE3, 4 = TAGE4.
c 1 = PTOTINC1, 2 = PTOTINC2, 3 = PTOTINC3, 4 = PTOTINC4.

  SIPP panels prior to the 1996 Panel were all “within wave” procedures. This means that the edits
  and imputations applied to a person’s records in one wave were independent of those in other
  waves. Imputation procedures for most of the core wave files from the 1996 Panel are different.
  The new procedures do make use of information from the preceding wave. When linking data
  across waves, apparent changes in income, program participation, labor force behavior, or most
  other outcomes could be due to real changes reported by the respondent, or they could be an
  artifact of the data editing and imputation performed by the Census Bureau. Although this
  problem arises primarily with the core wave files from panels prior to 1996, it is also true of the
  1996 Panel.9
  9
    The new imputation procedures for the 1996 Panel are expected to introduce less error than procedures used for
  earlier panels. Thus, the number and magnitude of spurious changes (as well as falsely imputed stability) should be
  reduced. Even so, imputation errors will occur, and caution is advised when using the core wave files for
  longitudinal research.

  When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in
  parentheses following 1996 variable names.
                                                        13-7
SIPP USERS’ GUIDE

There are two ways to identify cases with edited or imputed data. In panels prior to 1996, the
entire record was imputed if (1) MIS5 = 2 and MISj = 1 for j = 1, 2, 3, or 4 or (2) INTVW = 3 or
4. The record was imputed in the 1996 Panel if EPPINTVW = 3 or 4. In the 1996 Panel, persons
with Type Z noninterviews with prior wave information have their items imputed with
procedures that use their prior wave responses. The relatively few cases with no prior wave
information (those in Wave 1 and those in Waves 2–12 who are new to the sample) have their
records imputed with the Type Z procedure used in the pre-1996 files. For all panels, if the
record was not imputed, it is necessary to check the allocation (imputation) flags associated with
the variables of interest. Once identified, users might need to implement some form of
longitudinal editing and imputation or distinguish in their analyses between “real” changes and
those that may result from the core wave data processing procedures.

Basic demographic information, such as age, race, and sex, can also appear to change from one
wave to the next. In these instances, changes reflect corrections made in later interviews to
information collected in earlier interviews; it is generally safe to assume the most recent data are
correct.

When using the core wave files for longitudinal research, analysts should also note that the
sample weights included on the core wave files are calendar month specific. These weights may
not be appropriate for the planned longitudinal analyses. Chapter 8 has a detailed discussion of
how to use the sample weights provided with the SIPP files.


Linking Core Wave Files to Longitudinal Research Files

There are relatively few circumstances in which the core wave and full panels files need to be
linked because, for the most part, they contain the same information.10 In general, if the same
information is available from both the core wave and longitudinal research files, the information
from the longitudinal research files is preferable because the edit and imputation procedures used
for the longitudinal research files are believed to introduce less error than the procedures used for
the core wave files.11 However, some core information is contained only on the core wave files,
and, therefore, at times it will be necessary to merge the core wave and longitudinal research
files.

The following steps are necessary to link data from the core wave files with data from the full
panel files:

1. Create data extracts from the core wave and longitudinal research files;
2. Put the two extracts into the same format (either person-month format or person-record
   format);

10
   Because the 1996 longitudinal research file is not complete yet, the discussion in this section pertains only to files
for earlier panels. A revised version of this chapter will be available on the Census Bureau SIPP Web site
(http://www.sipp.census.gov/sipp/) when the 1996 longitudinal research file is completed.
11
   See footnote 1.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in
parentheses following 1996 variable names.
                                                         13-8
                                                                                            LINKING SIPP FILES

3. Sort the extracts into the same order; and
4. Merge the extracts, creating the final file.
The variables that uniquely identify people in the core wave and longitudinal research files have
different names. Table 13-3 shows the names for the three variables needed to match people
across those files for panels prior to 1996.12

            Table 13-3. Variables Identifying People in the Core Wave and Longitudinal
                              Research Files for Panels Prior to 1996

                                                                                           Longitudinal
             Variable                      Core Wave Files                                 Research Files
             Sample Unit ID                SUID                    is matched to           PP-ID
             Entry Address ID              ENTRY                   is matched to           PP-ENTRY
             Person Number                 PNUM                    is matched to           PP-PNUM


If the final file will be in person-record format, these are the only variables needed for the sort
and merge operations (steps 3 and 4, above). If the final file will be in person-month format, then
WAVE and REFMTH are also needed.

Figure 13-2 shows the SAS code to transform data from the longitudinal research files in wide-
record format into the person-month format used in the core wave files. The program creates a
person-month format file from the 1993 longitudinal research file.

Because SAS does not allow variable names with embedded dashes, the “-” characters in the
variable names have been replaced with underscore (“_”) characters. The 1993 Panel had 10
waves, so the output file will have up to 40 monthly records for each person: no records are
written for any months when pp_mis is not equal to 1. The program creates a data set with seven
variables: SUID (renamed from PP_ID), ENTRY (renamed from PP_ENTRY), PNUM (renamed
from PP_PNUM), REFMTH (which ranges from 1 to 4), WAVE (which ranges from 1 to 10),
AGE, and TOTINC.

The REFMTH variable is computed as modulus (i/4) if it is not equal to 0, or 4 if is equal to 0.
The modulus is the remainder from the division, so in month six of the panel the quantity is
modulus (6/4) = 2, in month seven it is modulus (7/4) = 3, and in month eight it is 4 (since the
remainder from the division of 8 by 4 is 0).

The wave is computed as the first integer greater than or equal to i/4. For month one, i/4 = 0.25,
so wave = 1. For month four, i/4 = 1, so wave = 1. For month 17, 17/4 = 4.25, so wave = 5.

The file created by the program in Figure 13-2 could be merged with an extract from the core
wave files from the 1993 Panel, using SUID, ENTRY, PNUM, WAVE, and REFMTH as the
match keys. If the longitudinal research file was in its original sort order, the file created by the
program in Figure 13-2 will already be sorted by this set of match keys.
12
     Current plans call for using consistent variable names across all files from the 1996 Panel.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in
parentheses following 1996 variable names.
                                                           13-9
SIPP USERS’ GUIDE

      Figure 13-2. Sample SAS Code to Change the Longitudinal Research Files from
         Person-Record Format to Person-Month Format for Panels Prior to 1996

 Data pmonth
    (keep =
       pp_id
       pp_entry
       pp_pnum
       refmth
       wave
       age
       totinc
     rename =
       (pp_id = suid
         pp_entry = entry
         pp_pnum = pnum
       )
    );

     /*
        this example works with the 1993 SIPP panel – 10 waves
     */
     set sipp93fp
        (keep =
           pp_id
           pp_entry
           pp_pnum
           pp_mis1 – pp_mis40
           age1 – age40
           totinc1 – totinc40
        );

     /*
        define arrays to ease the programming burden
     */
     array ages {40} age1 – age40;
     array totincs {40} totinc1 – totinc40;
     array pp_mis {40} pp_mis1 – pp_mis40;

     do i = 1 to 40;                            /*   for each month */
        if (pp_mis{i} eq 1) then do;            /*   if pp_mis is 1, use the data */
           age = ages{i};                       /*   the age in this month */
           totinc = totincs{i};                 /*   total income this month */

             j = mod(i,4);
             if (j eq 0) then refmth = 4;/* the reference month */
             else refmth = j;

          wave = ceil(i/4);                     /* the wave */
          output;                               /* write out the record */
       end;
    end;
 run;


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in
parentheses following 1996 variable names.
                                                13-10
                                                                                  LINKING SIPP FILES

Values for AGE and TOTINC from the core wave and longitudinal research files will not match
for all people in all months because the core wave files and the longitudinal research files are
subjected to different edit and imputation procedures.

In addition, beginning with the 1991 Panel, a missing wave imputation procedure has been
applied to the longitudinal research files: people who had missing data from one wave but
complete data from the two adjacent waves had data imputed for the missing wave in the
longitudinal research files.13 This means that some people will have data in the longitudinal
research files for months in which they have no records in the associated core wave files (those
who were not Type Z nonrespondents).


Linking Two or More Topical Module Files

At times it will be necessary to merge data from two or more topical module files. Any project
that studies the relationship between subject areas covered by different topical modules will
require such a merge. One example might be a study of the relationship between the use of health
care services (collected in Wave 3 of the 1993 Panel) and medical expenses (collected in Wave 4
of the 1993 Panel).

The mechanical process of linking topical module files is relatively straightforward. The topical
module files all have the same format (one record per person) and variable names, for the ID
variables are consistent across the topical module files: individuals are uniquely identified by the
combination of SSUID (ID), EENTAID (ENTRY), and EPPPNUM (PNUM).

However, a number of cautions should be noted:

1. Prior to the 1996 Panel, there were instances in which the same variable name was used in
   different topical module files for different variables. For example, in the 1990 Panel,
   TM8400 was used in the Wave 2 topical module for a variable that indicates whether the
   respondent completed 12th grade. The same variable name was used in the Wave 6 topical
   module to indicate whether the respondent was a parent of children under 21 years of age
   living in his or her household.
2. Not all people with records in one topical module file will have records in another topical
   module file. In the topical module files from the 1996 Panel, there will generally be a record
   for each person who was a responding SIPP household member in the fourth month of the
   wave’s core reference period. Prior to the 1996 Panel, all household members in the interview
   month have topical module records for a given wave. However, household composition
   changes from one wave to the next: some people leave SIPP households and others join SIPP

13
  Many of these situations arise with Type Z nonrespondents: nonresponding people who live in households with
other responding sample members. Type Z nonrespondents in the pre-1996 core wave files and those in the 1996
Panel files with no prior wave information were subjected to a whole-record imputation procedure, described in
Chapter 10. These people would have records in the core wave files, but different information—because it was
imputed using different procedures—in the longitudinal research files.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in
parentheses following 1996 variable names.
                                                   13-11
SIPP USERS’ GUIDE

     households, and this changing composition is reflected in the topical module files. Also, in
     the 1996 Panel, some people who were nonrespondents in month four of one wave may have
     been respondents in month four of another wave. Thus, when topical module files are
     merged, there will be a nontrivial number of nonmatches: people with data from only one of
     the topical modules. Nonmatches are addressed in greater detail later in this chapter.
3. Choosing appropriate weights is complicated by the fact that there are a substantial number
   of nonmatches across topical modules. One solution is to use one of the weights from the
   longitudinal research files. Chapter 8 gives a detailed discussion of the SIPP weights.
Often it will be necessary to merge additional information (such as sample weights) from the
core wave or longitudinal research files when working with multiple topical modules.

Users interested in measuring change with data from the topical module files (such as changes in
asset holdings, or changes in health or disability status) should proceed with caution. First, in
some instances measurement error is large relative to the actual changes that have taken place.
One example is found in the topical modules that measure levels of household assets and
liabilities.14 Although the topical modules can provide estimates of aggregate-level changes in
those instances, users should not attempt to measure those changes at the individual level. Also,
the edit and imputation procedures applied to the topical module files are all “within wave”
procedures. This means that the edits and imputations applied to a person’s records in one wave
are independent of those in other waves. When data are linked across waves, apparent changes
could be due to real changes reported by the respondent or they could be artifacts of the data
editing and imputation performed by the Census Bureau.

There are two ways to identify cases with edited or imputed data. In panels prior to 1996, the
entire record was imputed if (1) PP-MIS5 = 2 and PP-MISj = 1 for j = 1, 2, 3, or 4 or (2)
INTVW = 3 or 4. In the 1996 Panel, the record was imputed if (1) EPPMIS4 = 2 or (2)
EPPINTVW = 3 or 4. In the 1996 Panel, persons with Type Z noninterviews who have prior
wave information have their records imputed with procedures that use their prior wave
responses. For persons with no prior wave information (those in Wave 1 and those in Waves 2–
12 who are new to the sample), the Type Z imputation procedure is used. On all panels, users
should check the imputation flags associated with the variables of interest.


Linking Topical Module Files to Core Wave Files

Because the topical module files contain only limited information from the SIPP core, there will
be many times when it is necessary to merge data from the topical module files with data from
the SIPP core. One source of these data is the core wave files.15


14
   See the SIPP Quality Profile, 3rd Ed. (U.S. Census Bureau, 1998a) and SIPP Working Paper series for discussions
of this issue as it relates to this and other SIPP topical modules.
15
   The next section describes procedures for merging topical module files with data from the longitudinal research
files.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in
parentheses following 1996 variable names.
                                                     13-12
                                                                                       LINKING SIPP FILES

The first decision that must be made is which core wave file to use. Special attention should be
paid to the reference periods for the topical module items of interest. In the 1996 Panel, topical
module questions refer to either month four of the wave’s core reference period, or to a longer
period in the past (such as the preceding 12 months or the prior calendar year). In those
instances, information would come from the month-four records of the core wave files from the
same wave (and possibly from earlier months and waves). Prior to the 1996 Panel, many topical
module items referred to conditions in the interview month. The interview month, however, is
not included as a separate record in the core wave file for the same wave as the topical module.16
Rather, core information for the interview month of one wave is found in the month-one
information from the following wave. For example, the interview month for Wave 3 is month 13
in the SIPP panel, and core data for month 13 are collected as the first reference month of Wave
4.17 Commonly used reference periods for topical module items are the current (interview) month
(month one of the next wave), the previous month (month four of the current wave), the previous
4 months (the full reference period for the current wave), and the previous year.

The topical module files have one record per person, while the core wave files have up to four
records for each person (one record per person for each month the person was a SIPP sample
member). There are at least three options available when merging topical modules with data
from the SIPP core wave files:18

1. Pick a single month from the core wave files. For example, if the topical module items use
   the interview month as their reference period, it may make sense to use records for month
   one from the core wave files from the next wave.
2. Spread the topical module data across all records from the core wave file. That results in a
   final file in person-month format.
3. Create a single record for each person from the appropriate core wave file and merge the
   topical module data to that record. This results in a final file in the person-record format with
   the same monthly detail as in the second option described above.
The steps involved are as follows:

1. Create an extract from the core wave file(s) of interest.
2. If a single record for each person is desired, apply the algorithm in Figure 13-1, which is
   described in the section entitled Linking Within a Core Wave File—Transforming the
   Person-Month Format into the Person-Record Format.

16
   Some of the interview month information is contained on the records for the four reference months of the wave.
But in the person-month-format file there is no separate record for the interview month itself.
17
   Information collected during the interview month of one wave may not match the information collected about the
same calendar month in the subsequent wave. In the 1996 Panel, dependent interviewing techniques and other
checks made possible with CAI are used to help resolve those inconsistencies.
18
   Yet another option is to create a single record from the core wave files containing aggregate measures for the
reference period of interest. For example, it might make sense to create a single record from the “current” core wave
file with total income received during all 4 months of the wave’s reference period. Or the average number of hours
worked per week during the previous 4 months might be appropriate. Once the aggregate record is created, the
merge step is similar to the others described in this section.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in
parentheses following 1996 variable names.
                                                      13-13
SIPP USERS’ GUIDE

3. Sort the core wave extract using SSUID (SUID), EENTAID (ENTRY), and EPPPNUM
   (PNUM) as the sort keys. These three variables uniquely identify people in the core wave
   files. If the core wave extract is in the person-month format, include SREFMON (REFMTH)
   as the final sort key.
4. Create an extract from the topical module file of interest. Sort the topical module extract
   using SSUID (ID), EENTAID (ENTRY), and EPPPNUM (PNUM) as the sort keys.
5. For the 1996 Panel, merge the core wave extract with the topical module extract; use SSUID,
   ENTAID, and EPPPNUM as the sort keys. For panels prior to 1996, merge the core wave
   extract with the topical module extract; use the sort keys shown in Table 13-4.

               Table 13-4. Variables Identifying People in the Topical Module and
                            Core Wave Files for Panels Prior to 1996

          Variable                  Topical Module Files                                 Core Wave Files
          Sample Unit ID            ID                             is matched to         SUID
          Entry Address ID          ENTRY                          is matched to         ENTRY
          Person Number             PNUM                           is matched to         PNUM


When data from panels prior to 1996 are used, there will likely be a nontrivial number of
nonmatches between the core wave files and the topical module files. That will be true even
when a topical module is merged with core data from the same wave, because people who were
members of a SIPP household in the interview month but not during the previous 4 months will
have records in the topical module files but not in the core wave files.


Linking Topical Module Files to Longitudinal Research Files
from Pre-1996 Panels

While topical module files can be linked with data from the core wave files, there are many times
when it will be necessary or desirable to use the longitudinal research files instead.19 For
example, if the full panel weights20 are needed for the planned analysis, they must come from the
longitudinal research files. When the same core items are available from the core wave and the
longitudinal research files, analysts may prefer to use the longitudinal research files because the
edit and imputation procedures used for them are believed to introduce less error than the
procedures used for the core wave files.


19
   Because the full panel longitudinal research file for the 1996 SIPP was still under development at the time this
chapter was written, it is not yet possible to describe procedures for using that file. A revised version of this chapter
will be available once the longitudinal research file for the 1996 Panel is released to the public.
20
   Chapter 8 discusses the SIPP weights, their derivation, and use.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in
parentheses following 1996 variable names.
                                                        13-14
                                                                                      LINKING SIPP FILES

The steps involved are as follows:

1. Create an extract from the longitudinal research file.
2. If a file in the person-month format is desired, apply the algorithm described in the section
   above, Linking Core Wave Files to Longitudinal Research Files. The example in Figure 13-2
   can be adapted to that purpose, but the ID variables would need to be renamed to match those
   used in the topical module files rather than in the core wave files (Table 13-5).
3. Sort the full panel extract; use PP-ID, PP-ENTRY, and PP-PNUM as the sort keys. These
   three variables uniquely identify people in the longitudinal research files. If the full panel
   extract is in the person-month format, include WAVE and REFMTH as the final sort keys.
4. Create an extract from the topical module file of interest. Sort the extract; use ID (the
   variable name for the sample unit ID in the topical module files), ENTRY, and PNUM as the
   sort keys.
5. Merge the core wave extract with the topical module extract based on the sort keys described
   here and shown in Table 13-5.

               Table 13-5. Variables Identifying People in the Topical Module and
                      Longitudinal Research Files Prior to the 1996 Panel

                                                                                     Longitudinal
          Variable                 Topical Module Files                              Research Files
          Sample Unit ID           ID                          is matched to         PP-ID
          Entry Address ID         ENTRY                       is matched to         PP-ENTRY
          Person Number            PNUM                        is matched to         PP-PNUM


Because the longitudinal research files contain a record for every person who was ever a member
of a SIPP household, every person with a record in a topical module file should have a record in
the longitudinal research file. However, analysts working with a person-month-format file
containing records only for months when PP-MIS = 1 may find nonmatches.


Nonmatches When Merging Files
SIPP is designed to follow a group of people over an extended period of time. This group
includes only those who were interviewed in the first wave of the panel and the children
subsequently born to or adopted by them.21 Over the course of the panel, these original sample
members are followed and interviewed every 4 months. Secondary sample members, on the

21
  In the 1993 Panel all original sample members were followed no matter what their ages. In all other panels, only
original sample members aged 15 years or older are followed when they move to new addresses. In all cases,
however, the SIPP data files contain a record for all people, including children, who reside in a household with at
least one original panel member present.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in
parentheses following 1996 variable names.
                                                     13-15
SIPP USERS’ GUIDE

other hand, are part of the SIPP sample only for as long as they continue to reside with at least
one original sample member. As long as they are part of the SIPP sample, the secondary sample
members are interviewed and included in the SIPP data files.

The problem of nonmatches occurs only when users merge across waves for any types of files.
There is no matching problem when the same or different types of files are merged within the
same wave.

As shown in Table 13-6, there are a variety of reasons why a person may be in one SIPP data file
but not in another. All but one of the reasons are associated with people entering and leaving the
SIPP sample:22

1. The original sample person may have left the SIPP sample universe (e.g., died, moved
   abroad, moved into military barracks, or moved into an institution);
2. The original sample person may have left the sample but is still in the sample universe
   (sample attrition);
3. The original sample person may have just reentered the SIPP sample universe (after living
   abroad, etc.);
4. The person is a newborn (a special case of a person joining the sample universe);
5. The secondary sample member has just begun living with an original sample person;
6. The secondary sample member no longer lives with an original sample member;
7. The person had data for a “missing wave” imputed in the longitudinal research file and has
   no records in the core wave or topical module files for that wave; and
8. Prior to the 1996 Panel, the Census Bureau may have intentionally altered the identification
   information of the person, thereby making it difficult to find a match for this person (in rare
   situations referred to as merged households).
A person’s reason for leaving the SIPP sample is identified in the core wave and longitudinal
research files. In the former, the variable name is ULFTMAIN (REALFT). In the longitudinal
research files, the name is REASLEFT, and it has a value for each wave rather than each month.
Figure 13-3 shows the variable values and corresponding descriptions.

Procedures for dealing with nonmatches vary, depending largely on the reasons the person
entered or left the SIPP sample. A number of common scenarios are presented below.


22
     The SIPP following rules are described in greater detail in Chapter 2.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in
parentheses following 1996 variable names.
                                                          13-16
                                                                                      LINKING SIPP FILES

                                   Table 13-6. Reasons for Nonmatches

                                                                                  File #1             File #2
                                                                                  (earlier time       (later time
Reasons                                                                           period)             period)
People Exiting the Sample
Original sample people left the SIPP sample universe (left the population of      Present             Not present
inference)
  Person died
  Moved abroad—left sample universe
  Moved into military barracks—left sample universe
  Moved into an institution—left sample universe
Original sample person exited from the sample (still in the sample universe but    Present            Not present
no longer in the sample)
  Refused to be interviewed
Secondary sample person no longer lives with an original sample member             Present            Not present
People Entering the Sample
Newborn                                                                            Not present        Present
Original sample person returns to SIPP sample universe (returns to the             Not present        Present
population of inference)
  Moved from abroad—entered sample universe
  Moved from military barracks—entered sample universe
  Moved from an institution—entered sample universe
Original sample member returns to sample                                           Not present        Present
  Original sample member agrees to be interviewed and returns to sample
Secondary sample person now lives with an original sample member                   Not present        Present
Missing Wave Imputation in the Longitudinal Research File (Beginning with the 1991 Panel)
Person has data in the longitudinal research file but no data in the corresponding wave in the core   wave or topical
module files.
Merged Households—Special Case
“Old” version of the ID information                                                Present            Not present
“New” version of the ID information                                                Not present        Present


Exiting or Entering the Population

There is a fundamental distinction between situations in which people leave the sample because
they leave the SIPP sample universe and situations in which they leave the sample despite the
fact that they are still part of that population. The SIPP sample universe (the population that the
SIPP sample represents) is the noninstitutionalized, resident population of the United States. It
includes both civilian and military people; it includes adults and children who reside in the
United States and outside of institutions.

People who leave this population because they die, move abroad, or move into institutions exit
the SIPP sample because they are no longer a part of the population that SIPP represents. In
general, when nonmatches occur because people have entered or exited the population
represented by the SIPP sample, data should not be imputed and weights should not be adjusted
for the period when these people are outside of that population. From the perspective of SIPP,
these people do not exist when they are outside of the population represented by the sample.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in
parentheses following 1996 variable names.
                                                      13-17
SIPP USERS’ GUIDE

    Figure 13-3. Data Dictionary Entries for Variables Identifying the Reason a Person
                                  Left the SIPP Sample

                            Wave 2, 1996 Panel Core Wave File
D ULFTMAIN     2     606
T PE: UNEDITED VARIABLE - Main reason left Household
     What is the main reason ... left the household?
U Movers from households which contain sample persons at the time of
   interview, movers from a household which splits into multiple
   households. Note: This is an unedited field and the universe is not
   exact.<BR>
V           0 .Not answered
V           1 .Deceased
V           2 .Institutionalized
V           3 .On active duty in the Armed Forces
V           4 .Moved outside of U.S.
V           5 .Separation or divorce
V           6 .Marriage
V           7 .Became employed/unemployed
V           8 .Due to job change – other
V           9 .Listed in error in prior wave
V          10 .Other
V          11 .Moved to type C household
                                         1993 Full Panel Files

D REASLEFT    9     143 9   1
     Range = (0:9)
     Preedited reason for leaving the Household Control Card item 23
U Persons who left at any time during the reference period
  Subscript 1: not applicable for Observation 1
  Subscript 2 - 8: reason left in Observations 2 – 8
V   0 .Not applicable or not answered or nonmatch
V   1 .Left – deceased
V   2 .Left – institutionalized
V   3 .Left - living in armed forces barracks
V   4 .Left - moved outside of country
V   5 .Left - separation or divorce
V   6 .Left - person #201 or greater no longer living with sample person
V   7 .Left – other
V   8 .Entered merged household
V   9 .Interviewed in previous wave but not in sample

                                                                                       (figure continues)


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in
parentheses following 1996 variable names.
                                                13-18
                                                                                      LINKING SIPP FILES

     Figure 13-3. Data Dictionary Entries for Variables Identifying the Reason a Person
                              Left the SIPP Sample (continued)

                                              1993 Core Wave Files

D REALFT      2    521
     Reason for leaving the household
     Applicable when previous wave address ID is not equal to control
     card address ID
     Range=(00:00,05:12,25:31,99:99)
U All persons, including children, no longer in the household
V   00 .Not applicable or not answered
V   05 .Left – deceased
V   06 .Left – institutionalized
V   07 .Left – living in Armed Forces barracks
V   08 .Left – moved outside of country
V   09 .Left – separation or divorce
V   10 .Left – person #201+ no longer living with sample person
V   11 .Left – other
V   12 .Left – entered merged household
* Should have been deleted in a previous wave:
V   25 .Left – deceased
V   26 .Left – institutionalized
V   27 .Left – living in Armed Forces barracks
V   28 .Left – moved outside of country
V   29 .Left – separation or divorce
V   30 .Left - 201+ person no longer living with sample person
V   31 .Left – other
V   99 .Listed in error


The following examples help explain why weighting adjustments and imputation are problematic
in these situations:

!    A person is in the SIPP sample at Time 1 but dies before Time 2. In this case, the person is
     not part of the population at Time 2. In computing the aggregate (total) income of the
     population at Time 1, this person’s income would be included. To impute income to this
     person for the Time 2 observation, analysts would compute an aggregate income that is too
     high: The person had no income at Time 2, and so none should be imputed.23 If this case is
     dropped from the analysis file and the weights are inflated for the remaining sample, the
     estimate of the total population at Time 2 would be too high. Because this person was not a
     part of the population at Time 2, the weights for the remaining sample members should not
     be inflated to represent this individual.


23
  If the person had been alive with income that she or he did not report to the Census Bureau, an estimate of his or
her unreported income would be imputed to the individual. Failing to impute that unreported income would mean
that the income received by a member of the population is not represented anywhere in the sample. That value
would result in a sample estimate of aggregate income in the population that was lower than the actual value in the
population.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in
parentheses following 1996 variable names.
                                                      13-19
SIPP USERS’ GUIDE

!    A person is overseas at Time 1 but at Time 2 is living with an original sample member in the
     United States. At Time 1, this person was not part of the population represented by the SIPP
     sample. Because this person was not a part of that population, the SIPP sample should not be
     adjusted in any way to represent this individual.
A number of strategies are possible for dealing with cases in which nonmatches result from
people entering or leaving the population represented by the SIPP sample. One approach is to
drop those people from the analysis sample entirely. No adjustment would be made to the
weights of the remaining cases. However, the definition of the population represented by the
remaining sample would change. The remaining sample represents the population that existed at
both Time 1 and Time 2. It does not represent anyone who either entered or left the population.

That approach has the advantage of being simple to implement. It also results in a clearly defined
population of inference. Caution is necessary, however, to the extent that people entering and
leaving the population are systematically different from those who are present throughout the
period being studied: the remaining sample cannot be used to draw inferences about this other
part of the population. People entering and leaving prisons and nursing homes, for example,
likely have very different income profiles than the population that remains outside of these
institutions over the period under study.

If event-history models are used to analyze the data, another approach is possible.24 With these
models, exits from the population can be treated as competing outcomes. For example, in a study
of unemployment dynamics, a competing risks model might allow for three possible outcomes:
spells of unemployment can end because (1) a person becomes employed, (2) a person exits the
labor force, or (3) a person exits the population.25


Exiting the Sample but Remaining in the Population
(Sample Attrition)

Sample attrition occurs when people leave the SIPP sample but remain a part of the population
represented by that sample. In these instances the remaining sample generally should be adjusted
to represent the full population, including the part of the population represented by those who
leave the sample.

There are several options for handling such cases:

!    Impute the missing data and proceed. This option is appropriate for researchers familiar with
     the statistical literature on imputation for missing data. A full discussion of this topic is well
     beyond the scope of this manual. Analysts are cautioned, however, against using the common
     practice of “substituting the mean” for missing data. That practice can yield biased estimates

24
  For a description of these methods, see, for example, Tuma and Hannan (1984).
25
  In actual applications, more than three outcomes would likely be modeled. The determinants of entering a nursing
home, for example, are likely quite different from the determinants of entering a prison.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in
parentheses following 1996 variable names.
                                                     13-20
                                                                                        LINKING SIPP FILES

     of multivariate statistics (such as regression coefficients) and generally leads to downward-
     biased estimates of standard errors.
!    Drop cases with missing data, adjust (poststratify) the weights for the retained cases, and
     proceed. This poststratification involves several steps.
     1. Tabulate the weighted number of cases by various socioeconomic categories before
        dropping any cases.
     2. Repeat the tabulation after dropping the nonmatches.
     3. Compute adjustment factors by dividing the weighted numbers from step 1 (before
        dropping any cases) by the weighted numbers from step 2 (after dropping cases).
     4. Create a new weight variable by multiplying the original weight variable by the
        appropriate poststratification factor computed in step 3.
This situation requires caution. A user who drops records may introduce selection biases because
those in the retained sample may be more stable than those who leave. For example, the fact that
a (former) sample member has left may be associated with other changes in that person’s life,
such as giving birth, getting married, or getting a new job. Because the person left the sample, it
is not possible to know from the available data what changes actually did occur in each case.
Also, when records are dropped, the procedures for computing standard errors as described in the
source and accuracy statements provided with the data will no longer apply. The procedures
described in Chapter 7 for the direct estimation of standard errors should, however, work without
any modification. If the number of cases lacking complete information is small relative to the full
analysis sample (the full sample with positive weights), the biases introduced by dropping those
cases also are likely to be small and this procedure may be a viable alternative.

!    If the longitudinal research file is available, use a subset of the cases with complete data for
     which Census Bureau–provided weights are available and proceed. At the extreme, this
     procedure entails retaining only cases with positive full panel weights and using those
     weights for any analyses performed.26 This is a conservative approach, but one that is
     relatively easy to implement because the weights already exist, they have already been
     adjusted for the observed sample attrition, and the population of inference is clearly defined.
!    Use other missing data methods to provide estimates and their standard errors. A full
     discussion of these methods is beyond the scope of this manual. The methods are designed to
     make use of all available information from the cases with complete data without (directly)
     imputing data to cases with incomplete information. Interested users can consult the literature
     on the E-M algorithm for one example of how this can be done.27 Also, Skinner et al. (1989)
     discuss model-based approaches to the analysis of complex surveys with missing data.


26
   The calendar year weights on the longitudinal research files are also options worth exploring. Chapter 8 provides a
detailed discussion of the SIPP sample weights, their derivation, and use.
27
   For example, see Little and Rubin (1987). Users should also note that some statistical packages (e.g., SPSS) have
incorporated more sophisticated options for handling missing data than have generally been available in the past.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in
parentheses following 1996 variable names.
                                                       13-21
SIPP USERS’ GUIDE


Missing Wave Imputation in the Longitudinal Research Files
Prior to 1996

Beginning with the 1991 Panel, a missing wave imputation procedure has been applied to the
longitudinal research files: persons who had missing data from one wave but complete data from
the two adjacent waves had data imputed for the missing wave in the longitudinal research
files.28 Some of those cases are Type Z nonrespondents and will have records with different data
in the core wave files.29 Other people will have data in the longitudinal research files for months
when they have no records in the associated core wave or topical module files.

The correct procedure for dealing with the resulting nonmatches depends on which weight
variables will be used. If the weights are coming from the core wave or topical module files,
observations from the longitudinal research files not present in the cross-sectional files should be
dropped. That is because the weights on the core wave and topical module files are computed for
the samples in those files, samples that do not include the people who have had that wave
imputed in the longitudinal research files.

If the weights are coming from the longitudinal research file, then other procedures must be used
to deal with the missing data from the core wave and topical module files. In those instances, the
procedures described for dealing with sample attrition should be considered.


Merged Households in Panels Prior to 1996

Finally, nonmatches can occur when the Census Bureau changes the ID numbers for sample
members.30 Prior to the 1996 Panel, there were two very rare occasions when this happened. The
first occurred when two separate sampling units, each containing original sample members, were
merged together, perhaps because of a marriage. In this situation, the people in one of the
sampling units retained their identification information, while the people in the other sampling
unit had their identification information changed to agree with the retained set. The person
numbers of the changed set were modified to be between 180 and 199.

The second instance occurred when a SIPP household split into two new households (in which
each new household gained a new sample person), which later recombined. For example, a

28
   Imputed waves can be identified on the longitudinal research files by using the WAVFLG variable.
29
   The data are different because different imputation procedures are used.
30
    Because the Census Bureau is using new procedures in the 1996 Panel, merged households will not be an
identifiable source of nonmatches when files from the 1996 Panel are merged. Rather, they will appear no different
from other situations where people enter and leave the SIPP sample, such as through marriages, divorces, deaths,
and sample attrition. For example, in the 1996 Panel, there will be no way to identify which (if any) of the people
who appear to have entered the sample in Wave 3 were also sample members who appear to have left the sample
following Wave 2. The “new” sample members will be given person numbers in the same range as others who enter
the sample in Wave 3, and no previous wave information will be attached to them. The new procedures greatly
simplify the handling of these rare cases for both the Census Bureau and outside data users.

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in
parentheses following 1996 variable names.
                                                     13-22
                                                                             LINKING SIPP FILES

married couple separated in Wave 3, each moving in with a sibling. Both siblings were assigned
a person number of 301, because they entered the sample in Wave 3 at different addresses. If the
husband and wife reunited in Wave 6, bringing the siblings with them, one sibling’s person
number was changed. In this case, one of the siblings would have a person number of 301 and
the other would have a person number of 680 (or some number between 680 and 699 because the
households recombined in Wave 6).

Different file types (i.e., core wave, topical, and full panel) keep track of the changed ID values
differently. If the move occurred after the first month of a reference period, the core wave file
contains two records for the person whose identification information changed. The first record
contains the original identification information of the person before the move and identifies the
person as having exited the sample at the time of the move. The second record contains the new
identification information after the move and identifies the person as having entered the sample
at the time of the move. When the move occurs at the start of a reference period, only the second
record is retained in the core wave file. The topical module file, however, contains only the
second record, no matter when the move took place. The longitudinal research file contains both
records, no matter when the move took place.

The easiest way to find these people is to search the core wave file for people with a previous
wave identified as present, that is, PWSUID > 0 or PWENTRY > 0 or PWPNUM > 0. Users then
need to decide how they want to handle these special cases. There are several possibilities:

!   Change the identification information used in the waves before the move to the new values
    seen in the wave(s) after the move, and then merge the records using these ID values. This
    option is useful when working primarily with the person’s core wave data after the move.
!   Change the identification information in the waves after the move to the original values, and
    then use those ID values to merge records. This option is useful when working primarily with
    the person’s core wave data before the move.
!   Duplicate the person’s record, and use the initial identification information with one record
    and the new identification information with the other record; then merge those records. With
    this approach, the weights for the duplicated records will need to be adjusted so that the
    duplicated weights sum to the original (unduplicated) weights.
!   Treat this person as two people: once as someone who exits the sample at the time of the
    move and once as someone who enters the sample at the time of the move. That is how these
    cases are treated in the longitudinal research files. The weighting implications of this
    approach depend on the planned analysis.


When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in
parentheses following 1996 variable names.
                                                13-23
Appendixes
A. SIPP Users’ Guide Variable
   Crosswalk: 1993 to 1996
This appendix contains four sections showing the correspondences between the core wave file
variables in 1993 and those in 1996. The sections differ by order as follows:

1. By 1993 Variable Name
2. By 1996 Variable Name
3. By 1993 File Position
4. By 1996 File Position


                                           A-1
SIPP USERS’ GUIDE

          Ordered by 1993 Variable Name               Ordered by 1993 Variable Name
     1993                     1996              1993                     1996
   ADDID                   SHHADID             FKIND                    EFKIND
    AFDC                  RCUTYP20           FKPNUM                  RCUOWN23
AFDCPNUM                 RCUOWN20             FNKIDS                   RFNKIDS
  AFDPCT                       n/a           FNLWGT                  WPFINWGT
  AFDSAB                       n/a               FNP                     EFNP
  AFTIME                       n/a             FNSSR                    RFNSSR
     AGE                     TAGE            FOKLT18                  RFOKLT18
  BFFREE                  EFRERDBK          FOODSTMP                  RCUTYP27
   BFTOT                       n/a           FOSTKID                  RCUTYP23
  BREAKF                   EBRKFST            FOTHER                  TFOTHINC
 BRTHMN                    EBMNTH            FOWNKID                 RFOWNKID
  BRTHYR                   TBYEAR               FPOV                    TFPOV
 CAIDCOV                  RCUTYP57             FPROP                  THPRPINC
 CARECOV                   ECRMTH            FREFPER                  EFREFPER
  CHAMP                   RCHAMPM            FSOCSEC                  TFSOCSEC
 CHPNUM                        n/a            FSPNUM                 RCUOWN27
  CJ10003                 ASVJTINT           FSPOUSE                  EFSPOUSE
  CJ10407                 AMDJTINT             FSSHIP        EASST06, EASST08, EASST09
  CO10003                  ASVOINT               FSSI                    TFSSI
  CO10407                 AMDOINT            FTOTINC                  TFTOTINC
  CWORK                      ER55              FTRAN                  TFTRNINC
  DAYENT                       n/a             FTYPE                    EFTYPE
  DAYLFT                       n/a            FUNEMP                  TFUNEMP
DESGPNPT                  RDESGPNT             FVETS                    TFVETS
   DISAB                   EDISABL             FWGT                  WFFINWGT
  DISAGE                    TAGESS           GAPNUM                  RCUOW21A
    EARN                    TPEARN           GENASST                  RCUTYP21
 EASTAMT                  EEGYAMT              GIBILL                    ER40
  EDASST                  EEDFUND           GRDCMPL                       n/a
  EMPLED                       n/a           H5ADDID                      n/a
  EMPLYR                   EASST10             H5MIS                 EOUTCOME
  ENROLD       RENROLL, EENRLM, RENRLMA         H5NP                 EHHNUMPP
   ENTRY                   EENTAID             H5REF                  EHREFPER
     ESR                    RMESR             H5WGT                   WHFNWGT
 ETHNCTY                   EORIGIN           HACCESS                   EACCESS
    EWID                  UEVRWID              HAFDC                   THAFDC
   FAFDC                    TFAFDC             HCASH                   RHCBRF
  FAMREL                     ERRP           HCHANGE                  RHCHANGE
  FAMTYP                     ESFT              HEARN                   THEARN
 FCHANGE                 RFCHANGE             HENRGY      EEGYPMT1, EEGYPMT2, EEGYPMT3
   FEARN                    TFEARN            HFDSTP                   THFDSTP
  FFDSTP                   TFFDSTP              HHSC                  GHLFSAM
     FID                     RFID              HIFAM                      n/a
    FID2                     RFID2           HIGRADE                  EEDUCATE


                                          A-2
                     SIPP USERS’ GUIDE VARIABLE CROSSWALK: 1993 TO 1996

         Ordered by 1993 Variable Name                   Ordered by 1993 Variable Name
    1993                     1996                  1993                     1996
   HIIND                 RCUTYP58             IDISAGE                     AAGESS
  HINONH                 EHIOWNER          IEASTAMT                      AEGYAMT
  HIOWN                  EHIOWNER             IEDASST                    AEDFUND
   HIPAY                  EHICOST            IEMPLYR                      AEDASST
  HIPNUM          RCUOW58A, RCUOW58B         IENROLD          ARENROLL, AENRLM, EENLEVEL
   HISRC                  EHEMPLY          IETHNCTY                       AORIGIN
 HITM36B                      n/a                IEWID                       n/a
  HITYPE                 EHIOWNER              IFSSHIP                    AEDASST
 HLORNT                   EGVTRNT              IGIBILL                      AR40
 HLVQTR                   ELIVQRT          IGRDCMPL                          n/a
 HMEANS                   RHMTRF             IHENRGY                     AEGYPMT
 HMETRO                   TMETRO            IHIGRADE                    AEDUCATE
   HMSA                     TMSA                IHIIND                       n/a
 HNCASH                    RHNBRF              IHIOWN                   AHIOWNER
    HNF                     RHNF                IHIPAY                    AHICOST
  HNFAM                   RHNFAM                IHISRC                   AHEMPLY
HNONCSH                 THNONCSH              IHITYPE                   AHIOWNER
    HNP                 EHHNUMPP                 IINAF                    AAFNOW
   HNSF                     RHNSF               IJ10003                  ASVJTINT
   HNSSR                   RHNSSR               IJ10407                  AMDJTINT
 HOTHER                  THOTHINC                  IJ110                 ASJNTDIV
   HPOV                    THPOV                IJ110RI                   AMJADIV
   HPROP                 THPRPINC              IJ120OT                    AJACLR2
  HPUBHS                  EPUBHSE                  IJ130                   AMIJNT
 HREFPER                 EHREFPER             IJGRENT                     AJARNT
 HSOCSEC                 THSOCSEC             IJNRENT                      AJACLR
    HSSI                    THSSI                IJO110                 AMOWNDIV
  HSTATE                   TFIPSST            IJO110RI                  AMOTHDIV
  HSTRAT                  GVARSTR          ILCHCOST                          n/a
 HTENURE                  ETENURE           ILCHFREE                     AFRERDLN
 HTOTINC                 THTOTINC              ILCHPT                    AFREELUN
  HTRAN                  THTRNINC            ILCHTOT                         n/a
   HTYPE                   RHTYPE              ILEVEL                    AENLEVEL
 HUNEMP                  THUNEMP               ILUNCH                   AHOTLUNC
  HUNITS                   EUNITS              IMCOPT                        n/a
   HVETS                   THVETS                 INAF                    EAFNOW
   HWGT                  WHFNWGT                 INDSL                    AEDASST
 IBFFREE                 AFRERDBK           INKIDSBF                         n/a
  IBFTOT                      n/a           INKIDSHL                         n/a
 IBREAKF                  ABRKFST            INONHHI                     AHIOTHER
ICAIDCOV                      n/a               INTVW                    EPPINTVW
ICARECOV                  ACRMTH               IO10003                    ASVOINT
 ICWORK                      AR55              IO10407                   AMDOINT
  IDISAB                  ADISABL                 IO110                 ASOWNDIV


                                         A-3
SIPP USERS’ GUIDE

         Ordered by 1993 Variable Name                  Ordered by 1993 Variable Name
    1993                    1996                   1993                    1996
  IO110RI              AMOWNADV                    IR32                    AR32
   IO130                  AMIOWN                   IR34                    AR34
  IO14050                ARNDUP1                   IR35                    AR35
 IOGRENT                  AOARNT                   IR36                    AR36
 IONRENT                  AOACLR                   IR37                    AR37
 IOTHAID                  AEDASST                  IR38                    AR38
 IOTHVET                  AEDASST                  IR40                    AR40
   IPELL                  AEDASST                  IR41                    AR41
 IPHRENT                 AGVTRNT                   IR50                    AR50
   IPLUS                  AEDASST                  IR51                    AR51
   IR01A                   AR01A                   IR52                    AR52
   IR01K                   AR01K                   IR53                    AR53
   IR02A                    AR02                   IR54                    AR54
    IR03               AR03A, AR03K                IR55                    AR55
    IR05                    AR05                   IR56                    AR56
    IR06                    AR06                 IRACE                    ARACE
    IR07                    AR07               IREASAB                    AABRE
    IR08                    AR08               IRETIRD                   AEVERET
    IR10                    AR10               IRHCDIS                      n/a
   IR100                  AAST2B                IRJ10003                  ASVJT
   IR101                   AAST2C               IRJ10407                    n/a
   IR102                  AAST2D                 IRJ120                  AJNTRNT
   IR103                  AAST2A               IRJ120OT                   AJRNT2
   IR104             AMDJT, AMDOAST              IRJ130                 AMRTJNT
   IR105                  AAST3D               IRO10003                 ASVOAST
   IR106                   AAST3C              IRO10407                     n/a
   IR107                   AAST4C                IRO120                 AOWNRNT
   IR110                AMANYCHK                 IRO130                 AMRTOWN
    IR12                    AR12                  IS01A                 A01AMTA
   IR120                  AAST4A                  IS01K                 A01AMTK
    IR13                    AR13                  IS02A                  A02AMT
   IR130                   AAST3E                 IS02K                     n/a
   IR140                  AAST4B                   IS03            A03AMTA, A03AMTK
   IR150                 EOTHPROP                  IS05                  A05AMT
    IR20                    AR20                   IS06                  A06AMT
    IR21                    AR21                   IS07                  A07AMT
    IR23                    AR23                   IS08                  A08AMT
    IR24                    AR24                   IS10                  A10AMT
    IR25                    AR25                   IS12                  A12AMT
    IR27                    AR27                   IS13                  A13AMT
    IR28                    AR28                   IS20                  A20AMT
    IR29                    AR29                   IS21                  A21AMT
    IR30                    AR30                   IS23                  A23AMT
    IR31                    AR31                   IS24                  A24AMT


                                         A-4
                      SIPP USERS’ GUIDE VARIABLE CROSSWALK: 1993 TO 1996

        Ordered by 1993 Variable Name                Ordered by 1993 Variable Name
   1993                    1996                1993                     1996
   IS27                  A27AMT             ISE2OCC                  ABSOCC2
   IS28                  A28AMT                ISEX                    ASEX
   IS29                  A29AMT              ISPDAF                  AAFSRVDI
   IS30                  A30AMT             ISPINAF                      n/a
   IS31                  A31AMT            ISTLOAN                   AEDASST
   IS32                  A32AMT             ISUPPED                  AEDASST
   IS34                  A34AMT            ITAKJOB                       n/a
   IS35                  A35AMT           ITAKJOBN                       n/a
   IS36                  A36AMT            IUHOURS                    AJBHRS1
   IS37                  A37AMT               IUTILS                 AUTILYN
   IS38                  A38AMT           IVETSTAT                   AAFEVER
   IS40                  A40AMT            IVETTYP                   AVETTYP
   IS41                     n/a            IWKSJOB                       n/a
   IS50                  A50AMT           IWKSLOK                    AWKLKG
   IS51                  A51AMT              IWKSPT                   APTWRK
   IS52                  A52AMT            IWKSPTR                   APTRESN
   IS53                  A53AMT           IWKSTDY                    AEDASST
   IS54                  A54AMT           IWKSWOP                     AWKSAB
   IS55                  A55AMT            IWS12012                  ACLWRK1
   IS56                  A56AMT            IWS12024                  ARSEND1
   IS75                  A75AMT            IWS12026                  APAYHR1
ISE12214                AGROSB1            IWS12028                  APYRATE1
ISE12218                 AEMPB1            IWS12029                      n/a
ISE12220                 AINCPB1           IWS12030                      n/a
ISE12222                APROPB1            IWS12031                      n/a
ISE12232                ASLRYB1            IWS12044                  AUNION1
ISE12234                 AOINCB1           IWS12046                  ACNTRC1
ISE12254                 APRFTB1           IWS1IND                    AJBIND1
ISE12256                 APRFTB1           IWS1OCC                    AJBOCC1
ISE12260                ABMSUM1            IWS22112                  AEJDATE2
ISE1AMT                 ABMSUM1            IWS22124                  ARSEND2
 ISE1IND                 ABSIND1           IWS22126                  APAYHR2
ISE1OCC                 ABSOCC1            IWS22128                  APYRATE2
ISE22314                AGROSB2            IWS22129                      n/a
ISE22318                 AEMPB2            IWS22130                      n/a
ISE22320                 AINCPB2           IWS22131                      n/a
ISE22322                APROPB2            IWS22144                  AUNION2
ISE22332                ASLRYB2            IWS22146                  ACNTRC2
ISE22334                 AOINCB2           IWS2IND                    AJBIND2
ISE22354                 APRFTB2           IWS2OCC                    AJBOCC2
ISE22356                 APRFTB2              J10003                 TSVJTINT
ISE22360                ABMSUM2               J10407                 TMDJTINT
ISE2AMT                 ABMSUM2                 J110                 TSJNTDIV
 ISE2IND                 ABSIND2              J110RI                  TMJADIV


                                        A-5
SIPP USERS’ GUIDE

         Ordered by 1993 Variable Name             Ordered by 1993 Variable Name
   1993                      1996             1993                    1996
  J120OT                  TJACLR2            PNPT              EPNMOM, EPNDAD
    J130                   TMIJNT            PNSP                  EPNSPOUS
 JGRENT                    TJARNT            PNUM                  EPPPNUM
 JNRENT                    TJACLR          POPSTAT                 EPOPSTAT
LCHCOST                       n/a            PROP                  TPPRPINC
LCHFREE                  EFRERDLN          PWADDID                     n/a
  LCHPT                  EFREELUN          PWENTRY                     n/a
 LCHTOT                       n/a          PWPNUM                      n/a
  LEVEL                  EENLEVEL           PWRRP                      n/a
  LUNCH                  EHOTLUNC           PWSUID                     n/a
MCDPNUM                 RCUOWN57             R01A                    ER01A
  MCOPT                       n/a            R01K                    ER01K
MEDCODE                 RMEDCODE             R02A                     ER02
   MIS5                       n/a            R02K                      n/a
MONENT                        n/a             R03                ER03A, ER03K
 MONLFT                       n/a             R05                     ER05
 MONTH                   RHCALMN              R06                     ER06
     MS                      EMS              R07                     ER07
   NDSL                   EASST05             R08                     ER08
  NJOBS                  EJOBCNTR             R10                     ER10
 NKIDSBF                  RNKBRK              R100                  EAST2B
NKIDSHL                   RNKLUN              R101                  EAST2C
  NOINC                       n/a             R102                  EAST2D
 NONHHI                  EHIOTHER             R103                  EAST2A
  O10003                  TSVOINT             R104             EMDJT, EMDOAST
  O10407                  TMDOINT             R105                  EAST3D
   O110                  TSOWNDIV             R106                  EAST3C
  O110RI                TMOWNADV              R107                  EAST4C
   O130                   TMIOWN              R110              EAST3A, EAST3B
  O14050                  TRNDUP1             R12                     AR12
 OGRENT                   TOARNT              R120                  EAST4A
 ONRENT                    TOACLR             R13                     ER13
 OTHAID              EASST11, EASST07         R130                  EAST3E
  OTHER                  TPOTHINC             R140                  EAST4B
 OTHINC                     ER56              R150                 ERNDUP2
 OTHVET                   EASST02             R20                     ER20
OTHWELF                  RCUTYP24             R21                     ER21
OWPNUM                  RCUOW24A              R23                     ER23
  P5WGT                  WPFINWGT             R24                     ER24
  PANEL                    SPANEL             R25                     ER25
   PELL                   EASST01             R27                     ER27
 PHRENT                  TMTHRNT              R28                     ER28
   PLUS                   EASST05             R29                     ER29
  PNGDU                  EPNGUARD             R30                     ER30


                                         A-6
                       SIPP USERS’ GUIDE VARIABLE CROSSWALK: 1993 TO 1996

         Ordered by 1993 Variable Name              Ordered by 1993 Variable Name
   1993                      1996             1993                     1996
    R31                      ER31             RRPU                      n/a
    R32                      ER32          S01AMTA                  T01AMTA
    R34                      ER34          S01AMTK                  T01AMTK
    R35                      ER35          S02AMTA                   T02AMT
    R36                      ER36          S02AMTK                      n/a
    R37                      ER37           S03AMT             T03AMTA, T03AMTK
    R38                      ER38           S05AMT                   T05AMT
    R40                      ER40           S06AMT                      n/a
    R41                      ER41           S07AMT                   T07AMT
    R50                      ER50           S08AMT                   T08AMT
    R51                      ER51           S10AMT                   T10AMT
    R52                      ER52           S12AMT                   T12AMT
    R53                      ER53           S13AMT                   T13AMT
    R54                      ER54           S20AMT                   T20AMT
    R55                      ER55           S21AMT                   A20AMT
    R56                      ER56           S23AMT                   T23AMT
    R75               ER75, ER09, ER33      S24AMT                   T24AMT
   RACE                    ERACE            S27AMT                   T27AMT
 RAILRD                       n/a           S28AMT                   T28AMT
 REAENT                       n/a           S29AMT                   T29AMT
 REALFT                       n/a           S30AMT                   T30AMT
 REASAB                    EABRE            S31AMT                   T31AMT
 REFMTH                   SREFMON           S32AMT                   T32AMT
RENVELOP                      n/a           S34AMT                   T34AMT
 RETIRD                   EEVERET           S35AMT                   T35AMT
 RHCDIS                       n/a           S36AMT                   T36AMT
 RJ10003                    ESVJT           S37AMT                   T37AMT
 RJ10407                      n/a           S38AMT                   T38AMT
   RJ110                ESANYCHK            S40AMT                   T39AMT
 RJ110RI                EMOTHDIV            S41AMT                      n/a
   RJ120                  EJNTRNT           S50AMT                   T50AMT
 RJ120OT                   EJRNT2           S51AMT                   T51AMT
   RJ130                  EMRTJNT           S52AMT                   T52AMT
 RO10003                  ESVOAST           S53AMT                   T53AMT
 RO10407                      n/a           S54AMT                      n/a
  RO110                 EMANYCHK            S55AMT                   T55AMT
 RO110RI                EMOTHDIV            S56AMT                   T56AMT
  RO120                  EOWNRNT            S75AMT                   T75AMT
  RO130                  EMRTOWN             SAFDC                   TSAFDC
 RO14050                      n/a            SC1000                 EPDJBTHN
   ROT                   SROTATON          SCHANGE                 RSCHANGE
  RRDAY                       n/a           SE12201                   EBNO1
    RRP                     ERRP            SE12202                 EBIZNOW1
 RRPNUM                       n/a           SE12203                     n/a


                                         A-7
SIPP USERS’ GUIDE

        Ordered by 1993 Variable Name               Ordered by 1993 Variable Name
   1993                    1996                1993                     1996
 SE12212                 EHRSBS1            SFDSTP                   TSFDSTP
 SE12214                 EGROSB1               SID                      RSID
 SE12218                 TEMPB1              SKIND                   ESFKIND
 SE12220                 EINCPB1               SNP                     ESFNP
 SE12222                 EPROPB1            SOCSEC                  RCUTYP01
 SE12224                 EHPRTB1            SOCSR1                  ERESNSS1
 SE12226                EPARTB11            SOCSR2                  ERESNSS2
 SE12228                EPARTB21           SOKLT18                  ESOKLT18
 SE12230                EPARTB31            SOTHER                  TSOTHINC
 SE12232                 ESLRYB1           SOWNKID                  ESOWNKID
 SE12234                 EOINCB1             SPDAF                  EAFSRVDI
 SE12252                    n/a              SPINAF                      n/a
 SE12254                 TPRFTB1              SPOV                    TSFPOV
 SE12256                 TPRFTB1             SPROP                   TSPRPINC
 SE12260                TBMSUM1            SREFPER                  ESFRFPER
 SE1AMT                 TBMSUM1              SSDAY                       n/a
 SE1IND                  TBSIND1          SSICOVRG             ESSICHLD, ESSISELF
 SE1OCC                  TBSOCC1           SSOCSEC                  TSSOCSEC
 SE1WKS                     n/a             SSPNUM                 RCUOWN01
 SE22301                  EBNO2            SSPOUSE                    ESFSPSE
 SE22302                EBIZNOW2               SSSI                    TSSSI
 SE22303                    n/a             SSUNIT                       n/a
 SE22312                 EHRSBS2            STLOAN                   EASST05
 SE22314                 EGROSB2           STOTINC                  TSTOTINC
 SE22318                 TEMPB2              STRAN                  TSTRNINC
 SE22320                 EINCPB2             STYPE                   ESFTYPE
 SE22322                 EPROPB2              SUID                     SSUID
 SE22324                 EHPRTB2            SUNEMP                   TSUNEMP
 SE22326                EPARTB12            SUPPED                   EASST04
 SE22328                EPARTB22             SURGC                     GRGC
 SE22330                EPARTB32          SUSEQNUM                    SSUSEQ
 SE22332                 ESLRYB2           SUSTATE                    TFIPSST
 SE22334                 EOINCB2             SVETS                    TSVETS
 SE22352                    n/a               SWGT                 WSFINWGT
 SE22354                 TPRFTB2            TAKJOB                   RTAKJOB
 SE22356                 TPRFTB2           TAKJOBN                  RNOTAKE
 SE22360                TBMSUM2             TOTINC                  TPTOTINC
 SE2AMT                 TBMSUM2               TRAN                  TPTRNINC
 SE2IND                  TBSIND2            UHOURS                   EJBHRS1
 SE2OCC                  TBSOCC2           USRVDT1                     UAF1
 SE2WKS                     n/a            USRVDT2                     UAF2
  SEARN                  TSFEARN           USRVDT3                     UAF3
SENVELOP                    n/a               UTILS                  EUTILYN
   SEX                     ESEX            VETNUM           RCUOWN8A, RCUOWN8B


                                        A-8
                     SIPP USERS’ GUIDE VARIABLE CROSSWALK: 1993 TO 1996

        Ordered by 1993 Variable Name            Ordered by 1993 Variable Name
   1993                     1996            1993                    1996
  VETS                  RCUTYP08          WS22102                  EENO2
VETSMT                  EVAQUES           WS22103                ESTLEMP2
VETSTAT                  EAFEVER          WS22104                    n/a
 VETTYP                  EVETTYP          WS22112                ECLWRK2
  WAVE                    SWAVE           WS22116                TSJDATE2
 WEEKS                     EMAX           WS22118                TSJDATE2
 WESR1                  RWKESR1           WS22120                TEJDATE2
 WESR2                  RWKESR2           WS22122                TEJDATE2
 WESR3                  RWKESR3           WS22123                TEJDATE2
 WESR4                  RWKESR4           WS22124                 ERSEND2
 WESR5                  RWKESR5           WS22125                 EJBHRS2
WICCOV                  RCUTYP25          WS22126                EPAYHR2
WICPNUM                RCUOWN25           WS22128                TPYRATE2
WICVAL                 EMTHAM25           WS22129                 RPYPER2
WKSJOB                  RMWKWJB           WS22130                    n/a
WKSLOK                  RMWKLKG           WS22131                    n/a
 WKSPT                   EPTWRK           WS22144                 EUNION2
WKSPTR                   EPTRESN          WS22146                ECNTRC2
WKSTDY                   EASST03          WS2AMT                 TPMSUM2
WKSWOP                  RMWKSAB           WS2CALC          APAYHR2, APYRATE2
WS12002                   EENO1           WS2CHG                     n/a
WS12003                 ESTLEMP1          WS2IND                  EJBIND2
WS12004                      n/a          WS2OCC                  TJBOCC2
WS12012                 ECLWRK1           WS2WKS                     n/a
WS12016                 TSJDATE1           YEAR                  RHCALYR
WS12018                 TSJDATE1
WS12020                 TEJDATE1
WS12022                 TEJDATE1
WS12023                 TEJDATE1
WS12024                  ERSEND1
WS12025                  EJBHRS1
WS12026                  EPAYHR1
WS12028                 TPYRATE1
WS12029                  RPYPER1
WS12030                      n/a
WS12031                      n/a
WS12044                  EUNION1
WS12046                  ECNTRC1
WS1AMT                  TPMSUM1
WS1CALC           APAYHR1, APYRATE1
WS1CHG                       n/a
 WS1IND                  EJBIND1
WS1OCC                   TJBOCC1
WS1WKS                       n/a


                                        A-9
SIPP USERS’ GUIDE

         Ordered by 1996 Variable Name                  Ordered by 1996 Variable Name
    1993                    1996                   1993                    1996
   IS01A                 A01AMTA                  IR102                   AAST2D
   IS01K                 A01AMTK                  IR106                   AAST3C
   IS02A                  A02AMT                  IR105                   AAST3D
    IS03            A03AMTA, A03AMTK              IR130                   AAST3E
    IS05                  A05AMT                  IR120                   AAST4A
    IS06                  A06AMT                  IR140                   AAST4B
    IS07                  A07AMT                  IR107                   AAST4C
    IS08                  A08AMT               ISE12260                 ABMSUM1
    IS10                  A10AMT              ISE1AMT                   ABMSUM1
    IS12                  A12AMT               ISE22360                 ABMSUM2
    IS13                  A13AMT              ISE2AMT                   ABMSUM2
  S21AMT                  A20AMT              IBREAKF                    ABRKFST
    IS20                  A20AMT               ISE1IND                   ABSIND1
    IS21                  A21AMT               ISE2IND                   ABSIND2
    IS23                  A23AMT               ISE1OCC                   ABSOCC1
    IS24                  A24AMT               ISE2OCC                   ABSOCC2
    IS27                  A27AMT              IWS12012                  ACLWRK1
    IS28                  A28AMT              IWS12046                   ACNTRC1
    IS29                  A29AMT              IWS22146                   ACNTRC2
    IS30                  A30AMT            ICARECOV                     ACRMTH
    IS31                  A31AMT                IDISAB                   ADISABL
    IS32                  A32AMT              ISTLOAN                    AEDASST
    IS34                  A34AMT              IOTHVET                    AEDASST
    IS35                  A35AMT             IWKSTDY                     AEDASST
    IS36                  A36AMT                 IPELL                   AEDASST
    IS37                  A37AMT                 INDSL                   AEDASST
    IS38                  A38AMT                 IPLUS                   AEDASST
    IS40                  A40AMT              IEMPLYR                    AEDASST
    IS50                  A50AMT              IOTHAID                    AEDASST
    IS51                  A51AMT                IFSSHIP                  AEDASST
    IS52                  A52AMT              ISUPPED                    AEDASST
    IS53                  A53AMT              IEDASST                   AEDFUND
    IS54                  A54AMT            IHIGRADE                   AEDUCATE
    IS55                  A55AMT            IEASTAMT                    AEGYAMT
    IS56                  A56AMT             IHENRGY                    AEGYPMT
    IS75                  A75AMT              IWS22112                  AEJDATE2
 IREASAB                  AABRE                ISE12218                  AEMPB1
IVETSTAT                 AAFEVER               ISE22318                  AEMPB2
   IINAF                 AAFNOW                 ILEVEL                  AENLEVEL
  ISPDAF                 AAFSRVDI              IRETIRD                   AEVERET
 IDISAGE                  AAGESS                ILCHPT                  AFREELUN
   IR103                  AAST2A               IBFFREE                  AFRERDBK
   IR100                  AAST2B             ILCHFREE                   AFRERDLN
   IR101                  AAST2C               ISE12214                  AGROSB1


                                         A-10
                       SIPP USERS’ GUIDE VARIABLE CROSSWALK: 1993 TO 1996

           Ordered by 1996 Variable Name               Ordered by 1996 Variable Name
      1993                     1996               1993                    1996
  ISE22314                  AGROSB2            ISE12256                 APRFTB1
 IPHRENT                   AGVTRNT             ISE12254                 APRFTB1
   IHISRC                  AHEMPLY             ISE22356                 APRFTB2
   IHIPAY                   AHICOST            ISE22354                 APRFTB2
 INONHHI                   AHIOTHER            ISE12222                 APROPB1
  IHIOWN                  AHIOWNER             ISE22322                 APROPB2
  IHITYPE                 AHIOWNER            IWKSPTR                  APTRESN
  ILUNCH                  AHOTLUNC             IWKSPT                   APTWRK
  ISE12220                  AINCPB1           IWS12028                 APYRATE1
  ISE22320                  AINCPB2           IWS22128                 APYRATE2
  IJNRENT                    AJACLR              IR01A                   AR01A
   IJ120OT                  AJACLR2              IR01K                   AR01K
  IJGRENT                    AJARNT              IR02A                    AR02
 IUHOURS                    AJBHRS1               IR03               AR03A, AR03K
 IWS1IND                    AJBIND1               IR05                    AR05
 IWS2IND                    AJBIND2               IR06                    AR06
 IWS1OCC                    AJBOCC1               IR07                    AR07
 IWS2OCC                    AJBOCC2               IR08                    AR08
    IRJ120                  AJNTRNT               IR10                    AR10
 IRJ120OT                    AJRNT2               IR12                    AR12
     IR110                AMANYCHK                R12                     AR12
     IR104             AMDJT, AMDOAST             IR13                    AR13
   IJ10407                 AMDJTINT               IR20                    AR20
   CJ10407                 AMDJTINT               IR21                    AR21
  CO10407                  AMDOINT                IR23                    AR23
   IO10407                 AMDOINT                IR24                    AR24
     IJ130                   AMIJNT               IR25                    AR25
     IO130                  AMIOWN                IR27                    AR27
   IJ110RI                  AMJADIV               IR28                    AR28
  IJO110RI                AMOTHDIV                IR29                    AR29
   IO110RI               AMOWNADV                 IR30                    AR30
    IJO110                AMOWNDIV                IR31                    AR31
    IRJ130                 AMRTJNT                IR32                    AR32
   IRO130                  AMRTOWN                IR34                    AR34
 IONRENT                    AOACLR                IR35                    AR35
 IOGRENT                    AOARNT                IR36                    AR36
  ISE12234                  AOINCB1               IR37                    AR37
  ISE22334                  AOINCB2               IR38                    AR38
IETHNCTY                    AORIGIN               IR40                    AR40
   IRO120                  AOWNRNT              IGIBILL                   AR40
 WS1CALC             APAYHR1, APYRATE1            IR41                    AR41
 IWS12026                  APAYHR1                IR50                    AR50
 IWS22126                  APAYHR2                IR51                    AR51
 WS2CALC             APAYHR2, APYRATE2            IR52                    AR52


                                           A-11
SIPP USERS’ GUIDE

            Ordered by 1996 Variable Name              Ordered by 1996 Variable Name
      1993                     1996               1993                    1996
      IR53                     AR53               R101                   EAST2C
      IR54                     AR54               R102                   EAST2D
      IR55                     AR55               R110              EAST3A, EAST3B
  ICWORK                       AR55               R106                   EAST3C
      IR56                     AR56               R105                   EAST3D
    IRACE                     ARACE               R130                   EAST3E
 IENROLD        ARENROLL, AENRLM, EENLEVEL        R120                   EAST4A
   IO14050                  ARNDUP1               R140                   EAST4B
 IWS12024                   ARSEND1               R107                   EAST4C
 IWS22124                   ARSEND2             SE12202                EBIZNOW1
      ISEX                     ASEX             SE22302                EBIZNOW2
      IJ110                 ASJNTDIV           BRTHMN                   EBMNTH
  ISE12232                  ASLRYB1             SE12201                  EBNO1
  ISE22332                  ASLRYB2             SE22301                  EBNO2
     IO110                 ASOWNDIV             BREAKF                  EBRKFST
  IRJ10003                    ASVJT             WS12012                ECLWRK1
   CJ10003                  ASVJTINT            WS22112                ECLWRK2
   IJ10003                  ASVJTINT            WS12046                 ECNTRC1
 IRO10003                   ASVOAST             WS22146                 ECNTRC2
  CO10003                   ASVOINT            CARECOV                  ECRMTH
   IO10003                  ASVOINT              DISAB                  EDISABL
 IWS12044                   AUNION1             EDASST                  EEDFUND
 IWS22144                   AUNION2            HIGRADE                 EEDUCATE
    IUTILS                  AUTILYN            EASTAMT                 EEGYAMT
 IVETTYP                    AVETTYP            HENRGY EEGYPMT1, EEGYPMT2, EEGYPMT3
IWKSLOK                     AWKLKG               LEVEL                 EENLEVEL
IWKSWOP                      AWKSAB             WS12002                  EENO1
  REASAB                      EABRE             WS22102                  EENO2
HACCESS                     EACCESS              ENTRY                  EENTAID
VETSTAT                     EAFEVER             RETIRD                  EEVERET
     INAF                    EAFNOW              FKIND                   EFKIND
    SPDAF                   EAFSRVDI              FNP                     EFNP
     PELL                    EASST01             LCHPT                 EFREELUN
  OTHVET                     EASST02           FREFPER                 EFREFPER
 WKSTDY                      EASST03            BFFREE                 EFRERDBK
  SUPPED                     EASST04           LCHFREE                 EFRERDLN
     PLUS                    EASST05           FSPOUSE                 EFSPOUSE
     NDSL                    EASST05             FTYPE                   EFTYPE
  STLOAN                     EASST05            SE12214                 EGROSB1
   FSSHIP          EASST06, EASST08, EASST09    SE22314                 EGROSB2
  EMPLYR                     EASST10            HLORNT                 EGVTRNT
  OTHAID                EASST11, EASST07         HISRC                 EHEMPLY
      R103                   EAST2A               H5NP                EHHNUMPP
      R100                   EAST2B               HNP                 EHHNUMPP


                                           A-12
                      SIPP USERS’ GUIDE VARIABLE CROSSWALK: 1993 TO 1996

        Ordered by 1996 Variable Name              Ordered by 1996 Variable Name
   1993                    1996               1993                    1996
  HIPAY                  EHICOST           WS12026                 EPAYHR1
 NONHHI                 EHIOTHER           WS22126                 EPAYHR2
 HINONH                 EHIOWNER            SC1000                 EPDJBTHN
 HITYPE                 EHIOWNER            PNGDU                 EPNGUARD
 HIOWN                  EHIOWNER             PNPT              EPNMOM, EPNDAD
 LUNCH                 EHOTLUNC              PNSP                  EPNSPOUS
 SE12224                 EHPRTB1           POPSTAT                 EPOPSTAT
 SE22324                 EHPRTB2            INTVW                  EPPINTVW
  H5REF                 EHREFPER             PNUM                  EPPPNUM
HREFPER                 EHREFPER            SE12222                 EPROPB1
 SE12212                 EHRSBS1            SE22322                 EPROPB2
 SE22312                 EHRSBS2           WKSPTR                   EPTRESN
 SE12220                 EINCPB1            WKSPT                   EPTWRK
 SE22320                 EINCPB2           HPUBHS                  EPUBHSE
UHOURS                   EJBHRS1             R01A                    ER01A
 WS12025                 EJBHRS1             R01K                    ER01K
 WS22125                 EJBHRS2             R02A                     ER02
 WS1IND                  EJBIND1              R03                ER03A, ER03K
 WS2IND                  EJBIND2              R05                     ER05
  RJ120                  EJNTRNT              R06                     ER06
  NJOBS                 EJOBCNTR              R07                     ER07
 RJ120OT                  EJRNT2              R08                     ER08
 HLVQTR                  ELIVQRT              R10                     ER10
  RO110                EMANYCHK               R13                     ER13
 WEEKS                    EMAX                R20                     ER20
   R104             EMDJT, EMDOAST            R21                     ER21
 RJ110RI               EMOTHDIV               R23                     ER23
 RO110RI               EMOTHDIV               R24                     ER24
  RJ130                  EMRTJNT              R25                     ER25
  RO130                 EMRTOWN               R27                     ER27
    MS                     EMS                R28                     ER28
 WICVAL                EMTHAM25               R29                     ER29
 SE12234                 EOINCB1              R30                     ER30
 SE22334                 EOINCB2              R31                     ER31
ETHNCTY                  EORIGIN              R32                     ER32
  IR150                 EOTHPROP              R34                     ER34
  H5MIS                EOUTCOME               R35                     ER35
  RO120                 EOWNRNT               R36                     ER36
 SE12226                EPARTB11              R37                     ER37
 SE22326                EPARTB12              R38                     ER38
 SE12228                EPARTB21              R40                     ER40
 SE22328                EPARTB22            GIBILL                    ER40
 SE12230                EPARTB31              R41                     ER41
 SE22330                EPARTB32              R50                     ER50


                                        A-13
SIPP USERS’ GUIDE

         Ordered by 1996 Variable Name               Ordered by 1996 Variable Name
    1993                     1996               1993                    1996
    R51                      ER51             CHAMP                  RCHAMPM
    R52                      ER52           GAPNUM                  RCUOW21A
    R53                      ER53           OWPNUM                  RCUOW24A
    R54                      ER54            HIPNUM           RCUOW58A, RCUOW58B
    R55                      ER55            SSPNUM                 RCUOWN01
  CWORK                      ER55          AFDCPNUM                 RCUOWN20
    R56                      ER56            FKPNUM                 RCUOWN23
 OTHINC                      ER56           WICPNUM                 RCUOWN25
    R75               ER75, ER09, ER33       FSPNUM                 RCUOWN27
   RACE                     ERACE          MCDPNUM                  RCUOWN57
  SOCSR1                 ERESNSS1           VETNUM           RCUOWN8A, RCUOWN8B
  SOCSR2                 ERESNSS2            SOCSEC                  RCUTYP01
    R150                  ERNDUP2              VETS                  RCUTYP08
    RRP                      ERRP              AFDC                  RCUTYP20
 FAMREL                      ERRP           GENASST                  RCUTYP21
 WS12024                  ERSEND1           FOSTKID                  RCUTYP23
 WS22124                  ERSEND2           OTHWELF                  RCUTYP24
   RJ110                ESANYCHK             WICCOV                  RCUTYP25
    SEX                      ESEX          FOODSTMP                  RCUTYP27
   SKIND                  ESFKIND           CAIDCOV                  RCUTYP57
    SNP                     ESFNP              HIIND                 RCUTYP58
 SREFPER                 ESFRFPER          DESGPNPT                  RDESGPNT
 SSPOUSE                   ESFSPSE           ENROLD       RENROLL, EENRLM, RENRLMA
 FAMTYP                      ESFT           FCHANGE                 RFCHANGE
   STYPE                  ESFTYPE               FID                     RFID
  SE12232                 ESLRYB1              FID2                    RFID2
  SE22332                 ESLRYB2            FNKIDS                   RFNKIDS
 SOKLT18                 ESOKLT18             FNSSR                   RFNSSR
SOWNKID                  ESOWNKID           FOKLT18                  RFOKLT18
SSICOVRG            ESSICHLD, ESSISELF      FOWNKID                 RFOWNKID
 WS12003                 ESTLEMP1            MONTH                   RHCALMN
 WS22103                 ESTLEMP2              YEAR                  RHCALYR
  RJ10003                   ESVJT             HCASH                   RHCBRF
 RO10003                  ESVOAST          HCHANGE                  RHCHANGE
HTENURE                   ETENURE           HMEANS                    RHMTRF
 WS12044                  EUNION1            HNCASH                   RHNBRF
 WS22144                  EUNION2               HNF                     RHNF
  HUNITS                   EUNITS            HNFAM                    RHNFAM
   UTILS                  EUTILYN              HNSF                    RHNSF
 VETSMT                   EVAQUES             HNSSR                   RHNSSR
 VETTYP                   EVETTYP             HTYPE                   RHTYPE
   HHSC                  GHLFSAM           MEDCODE                  RMEDCODE
  SURGC                     GRGC                ESR                    RMESR
 HSTRAT                   GVARSTR           WKSLOK                   RMWKLKG


                                         A-14
                      SIPP USERS’ GUIDE VARIABLE CROSSWALK: 1993 TO 1996

         Ordered by 1996 Variable Name              Ordered by 1996 Variable Name
    1993                     1996              1993                    1996
 WKSWOP                  RMWKSAB             S37AMT                  T37AMT
  WKSJOB                 RMWKWJB             S38AMT                  T38AMT
 NKIDSBF                   RNKBRK            S40AMT                  T39AMT
 NKIDSHL                   RNKLUN            S50AMT                  T50AMT
 TAKJOBN                 RNOTAKE             S51AMT                  T51AMT
  WS12029                 RPYPER1            S52AMT                  T52AMT
  WS22129                 RPYPER2            S53AMT                  T53AMT
 SCHANGE                RSCHANGE             S55AMT                  T55AMT
     SID                     RSID            S56AMT                  T56AMT
  TAKJOB                  RTAKJOB            S75AMT                  T75AMT
   WESR1                  RWKESR1              AGE                    TAGE
   WESR2                  RWKESR2            DISAGE                  TAGESS
   WESR3                  RWKESR3           SE1AMT                  TBMSUM1
   WESR4                  RWKESR4            SE12260                TBMSUM1
   WESR5                  RWKESR5           SE2AMT                  TBMSUM2
   ADDID                  SHHADID            SE22360                TBMSUM2
   PANEL                   SPANEL            SE1IND                  TBSIND1
  REFMTH                  SREFMON            SE2IND                  TBSIND2
    ROT                  SROTATON            SE1OCC                 TBSOCC1
    SUID                    SSUID            SE2OCC                 TBSOCC2
SUSEQNUM                   SSUSEQ           BRTHYR                   TBYEAR
   WAVE                    SWAVE            WS12023                 TEJDATE1
 S01AMTA                  T01AMTA           WS12022                 TEJDATE1
 S01AMTK                  T01AMTK           WS12020                 TEJDATE1
 S02AMTA                   T02AMT           WS22122                 TEJDATE2
  S03AMT            T03AMTA, T03AMTK        WS22120                 TEJDATE2
  S05AMT                   T05AMT           WS22123                 TEJDATE2
  S07AMT                   T07AMT            SE12218                 TEMPB1
  S08AMT                   T08AMT            SE22318                 TEMPB2
  S10AMT                   T10AMT             FAFDC                  TFAFDC
  S12AMT                   T12AMT            FEARN                   TFEARN
  S13AMT                   T13AMT            FFDSTP                  TFFDSTP
  S20AMT                   T20AMT           SUSTATE                  TFIPSST
  S23AMT                   T23AMT           HSTATE                   TFIPSST
  S24AMT                   T24AMT           FOTHER                  TFOTHINC
  S27AMT                   T27AMT              FPOV                   TFPOV
  S28AMT                   T28AMT           FSOCSEC                 TFSOCSEC
  S29AMT                   T29AMT              FSSI                   TFSSI
  S30AMT                   T30AMT           FTOTINC                 TFTOTINC
  S31AMT                   T31AMT            FTRAN                  TFTRNINC
  S32AMT                   T32AMT           FUNEMP                  TFUNEMP
  S34AMT                   T34AMT             FVETS                  TFVETS
  S35AMT                   T35AMT            HAFDC                   THAFDC
  S36AMT                   T36AMT            HEARN                   THEARN


                                         A-15
SIPP USERS’ GUIDE

         Ordered by 1996 Variable Name              Ordered by 1996 Variable Name
    1993                     1996             1993                     1996
 HFDSTP                   THFDSTP            SEARN                   TSFEARN
HNONCSH                 THNONCSH              SPOV                    TSFPOV
 HOTHER                  THOTHINC           WS12018                 TSJDATE1
   HPOV                    THPOV            WS12016                 TSJDATE1
  HPROP                  THPRPINC           WS22118                 TSJDATE2
  FPROP                  THPRPINC           WS22116                 TSJDATE2
HSOCSEC                  THSOCSEC              J110                  TSJNTDIV
   HSSI                     THSSI           SOTHER                  TSOTHINC
HTOTINC                  THTOTINC             O110                  TSOWNDIV
 HTRAN                   THTRNINC            SPROP                   TSPRPINC
HUNEMP                   THUNEMP            SSOCSEC                 TSSOCSEC
  HVETS                   THVETS               SSSI                    TSSSI
 JNRENT                    TJACLR           STOTINC                 TSTOTINC
  J120OT                  TJACLR2            STRAN                  TSTRNINC
 JGRENT                    TJARNT           SUNEMP                   TSUNEMP
 WS1OCC                   TJBOCC1            SVETS                    TSVETS
 WS2OCC                   TJBOCC2            J10003                  TSVJTINT
  J10407                 TMDJTINT            O10003                  TSVOINT
  O10407                 TMDOINT            USRVDT1                    UAF1
HMETRO                    TMETRO            USRVDT2                    UAF2
    J130                   TMIJNT           USRVDT3                    UAF3
   O130                   TMIOWN              EWID                  UEVRWID
  J110RI                  TMJADIV            FWGT                  WFFINWGT
  O110RI                TMOWNADV             H5WGT                  WHFNWGT
  HMSA                      TMSA             HWGT                   WHFNWGT
 PHRENT                  TMTHRNT             P5WGT                 WPFINWGT
 ONRENT                   TOACLR            FNLWGT                 WPFINWGT
 OGRENT                   TOARNT             SWGT                  WSFINWGT
   EARN                   TPEARN
WS1AMT                   TPMSUM1
WS2AMT                   TPMSUM2
  OTHER                  TPOTHINC
   PROP                  TPPRPINC
 SE12254                  TPRFTB1
 SE12256                  TPRFTB1
 SE22356                  TPRFTB2
 SE22354                  TPRFTB2
 TOTINC                  TPTOTINC
   TRAN                  TPTRNINC
 WS12028                 TPYRATE1
 WS22128                 TPYRATE2
  O14050                 TRNDUP1
  SAFDC                   TSAFDC
 SFDSTP                   TSFDSTP


                                         A-16
                        SIPP USERS’ GUIDE VARIABLE CROSSWALK: 1993 TO 1996

         Ordered by 1993 File Position                 Ordered by 1993 File Position
    1993                   1996                   1993                   1996
SUSEQNUM                 SSUSEQ                 HVETS                  THVETS
    SUID                  SSUID                 HAFDC                  THAFDC
   ADDID                SHHADID                HFDSTP                 THFDSTP
   PANEL                 SPANEL                PHRENT                 TMTHRNT
   WAVE                  SWAVE                   UTILS                EUTILYN
  MONTH                RHCALMN                HENRGY EEGYPMT1, EEGYPMT2, EEGYPMT3
    YEAR                RHCALYR             EASTAMT                   EEGYAMT
     ROT               SROTATON                 LUNCH                EHOTLUNC
  REFMTH                SREFMON               NKIDSHL                  RNKLUN
 SUSTATE                 TFIPSST              LCHTOT                      n/a
   SURGC                  GRGC                  LCHPT                EFREELUN
    HHSC                GHLFSAM              LCHFREE                 EFRERDLN
  HSTRAT                GVARSTR              LCHCOST                      n/a
     HNF                   RHNF                BREAKF                 EBRKFST
  HNFAM                 RHNFAM                NKIDSBF                  RNKBRK
    HNSF                  RHNSF                 BFTOT                     n/a
 HREFPER               EHREFPER                BFFREE                EFRERDBK
     HNP               EHHNUMPP               IPHRENT                 AGVTRNT
   HTYPE                 RHTYPE                 IUTILS                AUTILYN
   HWGT                WHFNWGT               IHENRGY                  AEGYPMT
  HSTATE                 TFIPSST            IEASTAMT                 AEGYAMT
 HMETRO                  TMETRO                ILUNCH                AHOTLUNC
   HMSA                   TMSA               INKIDSHL                     n/a
   HNSSR                 RHNSSR               ILCHTOT                     n/a
 HACCESS                EACCESS                ILCHPT                AFREELUN
  HLVQTR                ELIVQRT             ILCHFREE                 AFRERDLN
  HUNITS                 EUNITS             ILCHCOST                      n/a
 HTENURE                ETENURE               IBREAKF                 ABRKFST
  HPUBHS                EPUBHSE              INKIDSBF                     n/a
  HLORNT                EGVTRNT                IBFTOT                     n/a
 HITM36B                    n/a               IBFFREE                AFRERDBK
 HMEANS                  RHMTRF                 H5REF                EHREFPER
   HCASH                 RHCBRF                  H5NP                EHHNUMPP
  HNCASH                 RHNBRF                 H5MIS                EOUTCOME
    HPOV                  THPOV              H5ADDID                      n/a
 HTOTINC               THTOTINC                H5WGT                 WHFNWGT
  HEARN                  THEARN                   FID                    RFID
   HPROP                THPRPINC                 FID2                   RFID2
  HTRAN                THTRNINC                   FNP                   EFNP
  HOTHER               THOTHINC               FREFPER                 EFREFPER
 HNONCSH               THNONCSH               FSPOUSE                 EFSPOUSE
 HSOCSEC               THSOCSEC                 FTYPE                  EFTYPE
    HSSI                  THSSI                 FKIND                  EFKIND
 HUNEMP                 THUNEMP                FNKIDS                  RFNKIDS


                                         A-17
SIPP USERS’ GUIDE

         Ordered by 1993 File Position               Ordered by 1993 File Position
    1993                   1996                 1993                    1996
FOWNKID                RFOWNKID                RRPU                      n/a
FOKLT18                 RFOKLT18                AGE                    TAGE
  FNSSR                  RFNSSR              BRTHMN                  EBMNTH
   FWGT                WFFINWGT              BRTHYR                  TBYEAR
   FPOV                   TFPOV              POPSTAT               EPOPSTAT
FTOTINC                 TFTOTINC                SEX                    ESEX
  FEARN                  TFEARN                RACE                   ERACE
  FPROP                 THPRPINC            ETHNCTY                  EORIGIN
  FTRAN                 TFTRNINC                 MS                    EMS
 FOTHER                 TFOTHINC               EWID                 UEVRWID
FSOCSEC                 TFSOCSEC             FAMTYP                    ESFT
    FSSI                  TFSSI              FAMREL                    ERRP
 FUNEMP                 TFUNEMP                PNSP                EPNSPOUS
  FVETS                  TFVETS                PNPT            EPNMOM, EPNDAD
  FAFDC                  TFAFDC               PNGDU                EPNGUARD
 FFDSTP                  TFFDSTP            DESGPNPT               RDESGPNT
    SID                    RSID               REALFT                     n/a
    SNP                   ESFNP              REAENT                      n/a
SREFPER                 ESFRFPER             DAYLFT                      n/a
SSPOUSE                  ESFSPSE             MONLFT                      n/a
  STYPE                  ESFTYPE              YRLFT                      n/a
  SKIND                  ESFKIND             DAYENT                      n/a
SOWNKID                ESOWNKID              MONENT                      n/a
SOKLT18                 ESOKLT18              YRENT                      n/a
   SWGT                WSFINWGT             HCHANGE                RHCHANGE
   SPOV                  TSFPOV             FCHANGE                RFCHANGE
STOTINC                 TSTOTINC            SCHANGE                RSCHANGE
  SEARN                 TSFEARN               TOTINC                TPTOTINC
  SPROP                 TSPRPINC               EARN                  TPEARN
  STRAN                 TSTRNINC               PROP                 TPPRPINC
 SOTHER                 TSOTHINC               TRAN                 TPTRNINC
SSOCSEC                 TSSOCSEC              OTHER                 TPOTHINC
    SSSI                  TSSSI               SC1000               EPDJBTHN
 SUNEMP                 TSUNEMP                 ESR                   RMESR
  SVETS                  TSVETS               WEEKS                   EMAX
  SAFDC                  TSAFDC               WESR1                 RWKESR1
 SFDSTP                  TSFDSTP              WESR2                 RWKESR2
  ENTRY                 EENTAID               WESR3                 RWKESR3
   PNUM                 EPPPNUM               WESR4                 RWKESR4
  INTVW                EPPINTVW               WESR5                 RWKESR5
    MIS5                    n/a              WKSJOB                RMWKWJB
FNLWGT                 WPFINWGT             WKSWOP                 RMWKSAB
  P5WGT                WPFINWGT              WKSLOK                RMWKLKG
    RRP                    ERRP              REASAB                   EABRE


                                         A-18
                     SIPP USERS’ GUIDE VARIABLE CROSSWALK: 1993 TO 1996

         Ordered by 1993 File Position                Ordered by 1993 File Position
    1993                   1996                  1993                   1996
  TAKJOB                RTAKJOB               WICVAL                EMTHAM25
 TAKJOBN                RNOTAKE              WICPNUM                RCUOWN25
  CWORK                    ER55              CAIDCOV                RCUTYP57
 UHOURS                  EJBHRS1            MCDPNUM                 RCUOWN57
   WKSPT                 EPTWRK                 HIIND               RCUTYP58
  WKSPTR                EPTRESN               HIPNUM         RCUOW58A, RCUOW58B
  EMPLED                    n/a               HINONH                EHIOWNER
   DISAB                 EDISABL               CHAMP                RCHAMPM
  RHCDIS                    n/a               CHPNUM                     n/a
 VETSTAT                EAFEVER                HIOWN                EHIOWNER
    INAF                 EAFNOW                 HISRC                EHEMPLY
  SPINAF                    n/a                 HIPAY                EHICOST
 USRVDT1                   UAF1                HITYPE               EHIOWNER
 USRVDT2                   UAF2                HIFAM                     n/a
 USRVDT3                   UAF3               NONHHI                EHIOTHER
  AFTIME                    n/a               HIGRADE               EEDUCATE
  AFDSAB                    n/a              GRDCMPL                     n/a
  AFDPCT                    n/a               ENROLD      RENROLL, EENRLM, RENRLMA
   SPDAF                EAFSRVDI                LEVEL               EENLEVEL
    VETS               RCUTYP08                EDASST                EEDFUND
  VETSMT                EVAQUES                GIBILL                   ER40
 VETNUM         RCUOWN8A, RCUOWN8B            OTHVET                  EASST02
  RETIRD                EEVERET               WKSTDY                  EASST03
  SOCSEC               RCUTYP01                  PELL                 EASST01
  SSPNUM               RCUOWN01                SUPPED                 EASST04
  SOCSR1                ERESNSS1                 NDSL                 EASST05
  SOCSR2                ERESNSS2              STLOAN                  EASST05
  DISAGE                 TAGESS                  PLUS                 EASST05
  RAILRD                    n/a               EMPLYR                  EASST10
 RRPNUM                     n/a                FSSHIP      EASST06, EASST08, EASST09
 CARECOV                 ECRMTH                OTHAID            EASST11, EASST07
MEDCODE                RMEDCODE                OTHINC                   ER56
  MCOPT                     n/a                 NOINC                    n/a
FOODSTMP               RCUTYP27               PWSUID                     n/a
  FSPNUM               RCUOWN27              PWENTRY                     n/a
   AFDC                RCUTYP20               PWPNUM                     n/a
AFDCPNUM               RCUOWN20                PWRRP                     n/a
 GENASST               RCUTYP21              PWADDID                     n/a
 GAPNUM                RCUOW21A                  ISEX                   ASEX
 FOSTKID               RCUTYP23                 IRACE                  ARACE
 FKPNUM                RCUOWN23             IETHNCTY                 AORIGIN
 OTHWELF               RCUTYP24              IHIGRADE               AEDUCATE
 OWPNUM                RCUOW24A             IGRDCMPL                     n/a
  WICCOV               RCUTYP25                 IEWID                    n/a


                                         A-19
SIPP USERS’ GUIDE

            Ordered by 1993 File Position            Ordered by 1993 File Position
       1993                   1996            1993                     1996
 IWKSJOB                       n/a           WS1OCC                 TJBOCC1
IWKSWOP                    AWKSAB            WS1IND                  EJBIND1
 IWKSLOK                   AWKLKG           WS1WKS                      n/a
  IREASAB                   AABRE           WS1AMT                  TPMSUM1
  ITAKJOB                      n/a           WS12002                  EENO1
ITAKJOBN                       n/a           WS12012                ECLWRK1
  ICWORK                      AR55          WS1CHG                      n/a
 IUHOURS                   AJBHRS1           WS12018                TSJDATE1
   IWKSPT                  APTWRK            WS12016                TSJDATE1
 IWKSPTR                   APTRESN           WS12022                TEJDATE1
    IDISAB                 ADISABL           WS12020                TEJDATE1
  IDISAGE                   AAGESS           WS12023                TEJDATE1
  IRHCDIS                      n/a           WS12024                ERSEND1
IVETSTAT                   AAFEVER           WS12025                 EJBHRS1
      IINAF                AAFNOW            WS12026                EPAYHR1
   ISPINAF                     n/a           WS12028               TPYRATE1
    ISPDAF                AAFSRVDI           WS12029                RPYPER1
   IRETIRD                 AEVERET           WS12031                    n/a
ICARECOV                   ACRMTH            WS12030                    n/a
   IMCOPT                      n/a           WS12044                EUNION1
ICAIDCOV                       n/a           WS12046                ECNTRC1
     IHIIND                    n/a          IWS1OCC                 AJBOCC1
   IHIOWN                 AHIOWNER          IWS1IND                  AJBIND1
     IHISRC                AHEMPLY          IWS12012                ACLWRK1
    IHIPAY                 AHICOST          IWS12024                ARSEND1
   IHITYPE                AHIOWNER          IWS12026                APAYHR1
  INONHHI                 AHIOTHER          IWS12028               APYRATE1
 IENROLD       ARENROLL, AENRLM, EENLEVEL   IWS12029                    n/a
    ILEVEL                AENLEVEL          IWS12031                    n/a
  IEDASST                  AEDFUND          IWS12030                    n/a
    IGIBILL                   AR40          IWS12044                AUNION1
 IOTHVET                   AEDASST          IWS12046                ACNTRC1
 IWKSTDY                   AEDASST          WS1CALC           APAYHR1, APYRATE1
      IPELL                AEDASST           WS22103               ESTLEMP2
  ISUPPED                  AEDASST           WS22104                    n/a
      INDSL                AEDASST           WS2OCC                 TJBOCC2
  ISTLOAN                  AEDASST           WS2IND                  EJBIND2
      IPLUS                AEDASST          WS2WKS                      n/a
 IEMPLYR                   AEDASST          WS2AMT                  TPMSUM2
    IFSSHIP                AEDASST           WS22102                  EENO2
  IOTHAID                  AEDASST           WS22112                ECLWRK2
     NJOBS                EJOBCNTR          WS2CHG                      n/a
  WS12003                 ESTLEMP1           WS22118                TSJDATE2
  WS12004                      n/a           WS22116                TSJDATE2


                                        A-20
                      SIPP USERS’ GUIDE VARIABLE CROSSWALK: 1993 TO 1996

         Ordered by 1993 File Position               Ordered by 1993 File Position
   1993                    1996                1993                    1996
 WS22122                TEJDATE2             SE12256                 TPRFTB1
 WS22120                TEJDATE2             SE12260                TBMSUM1
 WS22123                TEJDATE2            ISE1OCC                 ABSOCC1
 WS22124                ERSEND2              ISE1IND                 ABSIND1
 WS22125                 EJBHRS2            ISE12214                AGROSB1
 WS22126                EPAYHR2             ISE12218                 AEMPB1
 WS22128               TPYRATE2             ISE12220                 AINCPB1
 WS22129                RPYPER2             ISE12222                APROPB1
 WS22131                    n/a             ISE12232                ASLRYB1
 WS22130                    n/a             ISE12234                AOINCB1
 WS22144                EUNION2             ISE12254                APRFTB1
 WS22146                ECNTRC2             ISE12256                APRFTB1
IWS2OCC                 AJBOCC2             ISE12260                ABMSUM1
IWS2IND                  AJBIND2            ISE1AMT                 ABMSUM1
IWS22112                AEJDATE2             SE22302               EBIZNOW2
IWS22124                ARSEND2              SE22303                    n/a
IWS22126                APAYHR2               SE2IND                 TBSIND2
IWS22128               APYRATE2              SE2OCC                 TBSOCC2
IWS22129                    n/a              SE2WKS                     n/a
IWS22131                    n/a              SE2AMT                 TBMSUM2
IWS22130                    n/a              SE22301                  EBNO2
IWS22144                AUNION2              SE22312                EHRSBS2
IWS22146                ACNTRC2              SE22314                EGROSB2
WS2CALC           APAYHR2, APYRATE2          SE22318                 TEMPB2
 SE12202               EBIZNOW1              SE22320                 EINCPB2
 SE12203                    n/a              SE22322                EPROPB2
 SE1IND                  TBSIND1             SE22324                EHPRTB2
 SE1OCC                 TBSOCC1              SE22326                EPARTB12
 SE1WKS                     n/a              SE22328                EPARTB22
 SE1AMT                 TBMSUM1              SE22330                EPARTB32
 SE12201                  EBNO1              SE22332                ESLRYB2
 SE12212                EHRSBS1              SE22334                 EOINCB2
 SE12214                EGROSB1              SE22352                    n/a
 SE12218                 TEMPB1              SE22354                 TPRFTB2
 SE12220                 EINCPB1             SE22356                 TPRFTB2
 SE12222                EPROPB1              SE22360                TBMSUM2
 SE12224                EHPRTB1             ISE2OCC                 ABSOCC2
 SE12226                EPARTB11             ISE2IND                 ABSIND2
 SE12228                EPARTB21            ISE22314                AGROSB2
 SE12230                EPARTB31            ISE22318                 AEMPB2
 SE12232                ESLRYB1             ISE22320                 AINCPB2
 SE12234                 EOINCB1            ISE22322                APROPB2
 SE12252                    n/a             ISE22332                ASLRYB2
 SE12254                 TPRFTB1            ISE22334                AOINCB2


                                         A-21
SIPP USERS’ GUIDE

         Ordered by 1993 File Position              Ordered by 1993 File Position
  1993                     1996                1993                   1996
ISE22354                APRFTB2             S02AMTA                 T02AMT
ISE22356                APRFTB2             S02AMTK                    n/a
ISE22360                ABMSUM2              S03AMT          T03AMTA, T03AMTK
ISE2AMT                 ABMSUM2              S05AMT                 T05AMT
  R01A                    ER01A              S06AMT                    n/a
  R01K                    ER01K              S07AMT                 T07AMT
  R02A                     ER02              S08AMT                 T08AMT
  R02K                      n/a              S10AMT                 T10AMT
   R03                ER03A, ER03K           S12AMT                 T12AMT
   R05                     ER05              S13AMT                 T13AMT
   R06                     ER06              S20AMT                 T20AMT
   R07                     ER07              S21AMT                 A20AMT
   R08                     ER08              S23AMT                 T23AMT
   R10                     ER10              S24AMT                 T24AMT
   R12                     AR12              S27AMT                 T27AMT
   R13                     ER13              S28AMT                 T28AMT
   R20                     ER20              S29AMT                 T29AMT
   R21                     ER21              S30AMT                 T30AMT
   R23                     ER23              S31AMT                 T31AMT
   R24                     ER24              S32AMT                 T32AMT
   R25                     ER25              S34AMT                 T34AMT
   R27                     ER27              S35AMT                 T35AMT
   R28                     ER28              S36AMT                 T36AMT
   R29                     ER29              S37AMT                 T37AMT
   R30                     ER30              S38AMT                 T38AMT
   R31                     ER31              S40AMT                 T39AMT
   R32                     ER32              S41AMT                    n/a
   R34                     ER34              S50AMT                 T50AMT
   R35                     ER35              S51AMT                 T51AMT
   R36                     ER36              S52AMT                 T52AMT
   R37                     ER37              S53AMT                 T53AMT
   R38                     ER38              S54AMT                    n/a
   R40                     ER40              S55AMT                 T55AMT
   R41                     ER41              S56AMT                 T56AMT
   R50                     ER50              S75AMT                 T75AMT
   R51                     ER51               IR01A                  AR01A
   R52                     ER52               IR01K                  AR01K
   R53                     ER53               IR02A                   AR02
   R54                     ER54                IR03              AR03A, AR03K
   R55                     ER55                IR05                   AR05
   R56                     ER56                IR06                   AR06
   R75               ER75, ER09, ER33          IR07                   AR07
S01AMTA                 T01AMTA                IR08                   AR08
S01AMTK                 T01AMTK                IR10                   AR10


                                         A-22
                     SIPP USERS’ GUIDE VARIABLE CROSSWALK: 1993 TO 1996

        Ordered by 1993 File Position                  Ordered by 1993 File Position
 1993                     1996                   1993                     1996
 IR12                     AR12                   IS28                  A28AMT
 IR13                     AR13                   IS29                  A29AMT
 IR20                     AR20                   IS30                  A30AMT
 IR21                     AR21                   IS31                  A31AMT
 IR23                     AR23                   IS32                  A32AMT
 IR24                     AR24                   IS34                  A34AMT
 IR25                     AR25                   IS35                  A35AMT
 IR27                     AR27                   IS36                  A36AMT
 IR28                     AR28                   IS37                  A37AMT
 IR29                     AR29                   IS38                  A38AMT
 IR30                     AR30                   IS40                  A40AMT
 IR31                     AR31                   IS41                      n/a
 IR32                     AR32                   IS50                  A50AMT
 IR34                     AR34                   IS51                  A51AMT
 IR35                     AR35                   IS52                  A52AMT
 IR36                     AR36                   IS53                  A53AMT
 IR37                     AR37                   IS54                  A54AMT
 IR38                     AR38                   IS55                  A55AMT
 IR40                     AR40                   IS56                  A56AMT
 IR41                     AR41                   IS75                  A75AMT
 IR50                     AR50                   R100                   EAST2B
 IR51                     AR51                   R101                   EAST2C
 IR52                     AR52                   R102                   EAST2D
 IR53                     AR53                   R103                   EAST2A
 IR54                     AR54                 RJ10003                   ESVJT
 IR55                     AR55                 RO10003                ESVOAST
 IR56                     AR56                   R104             EMDJT, EMDOAST
IS01A                  A01AMTA                   R105                   EAST3D
IS01K                  A01AMTK                   R106                   EAST3C
IS02A                   A02AMT                   R107                   EAST4C
IS02K                      n/a                 RJ10407                     n/a
 IS03            A03AMTA, A03AMTK              RO10407                     n/a
 IS05                   A05AMT                   R110              EAST3A, EAST3B
 IS06                   A06AMT                  RJ110                ESANYCHK
 IS07                   A07AMT                  RO110                EMANYCHK
 IS08                   A08AMT                 RJ110RI               EMOTHDIV
 IS10                   A10AMT                 RO110RI               EMOTHDIV
 IS12                   A12AMT                   R120                   EAST4A
 IS13                   A13AMT                  RJ120                  EJNTRNT
 IS20                   A20AMT                  RO120                 EOWNRNT
 IS21                   A21AMT                 RJ120OT                  EJRNT2
 IS23                   A23AMT                   R130                   EAST3E
 IS24                   A24AMT                  RJ130                 EMRTJNT
 IS27                   A27AMT                  RO130                EMRTOWN


                                        A-23
SIPP USERS’ GUIDE

          Ordered by 1993 File Position                  Ordered by 1993 File Position
     1993                    1996                  1993                    1996
     R140                  EAST4B                IRO130                AMRTOWN
     R150                ERNDUP2                  IR140                  AAST4B
 RO14050                      n/a                 IR150                EOTHPROP
   J10003                TSVJTINT               IJ10003                 ASVJTINT
   O10003                 TSVOINT               IO10003                 ASVOINT
   J10407                TMDJTINT               IJ10407                AMDJTINT
   O10407                TMDOINT                IO10407                 AMDOINT
     J110                TSJNTDIV                  IJ110                ASJNTDIV
    O110                TSOWNDIV                  IO110                ASOWNDIV
   J110RI                 TMJADIV                IJ110RI                AMJADIV
   O110RI              TMOWNADV                 IO110RI               AMOWNADV
  JGRENT                   TJARNT              IJGRENT                   AJARNT
  JNRENT                   TJACLR              IJNRENT                   AJACLR
 OGRENT                   TOARNT              IOGRENT                    AOARNT
 ONRENT                   TOACLR              IONRENT                    AOACLR
   J120OT                 TJACLR2               IJ120OT                 AJACLR2
     J130                  TMIJNT                  IJ130                 AMIJNT
    O130                  TMIOWN                  IO130                 AMIOWN
   O14050                TRNDUP1                IO14050                 ARNDUP1
  CJ10003                ASVJTINT              VETTYP                   EVETTYP
 CO10003                  ASVOINT             IVETTYP                   AVETTYP
  CJ10407                AMDJTINT               SSUNIT                      n/a
 CO10407                 AMDOINT             SENVELOP                       n/a
    IR100                 AAST2B                SSDAY                       n/a
    IR101                 AAST2C             RENVELOP                       n/a
    IR102                 AAST2D                RRDAY                       n/a
    IR103                 AAST2A             SSICOVRG             ESSICHLD, ESSISELF
 IRJ10003                   ASVJT
 IRO10003                ASVOAST
    IR104           AMDJT, AMDOAST
    IR105                 AAST3D
    IR106                 AAST3C
    IR107                 AAST4C
 IRJ10407                     n/a
 IRO10407                     n/a
    IR110               AMANYCHK
   IJO110               AMOWNDIV
 IJO110RI               AMOTHDIV
    IR120                 AAST4A
   IRJ120                AJNTRNT
   IRO120                AOWNRNT
 IRJ120OT                  AJRNT2
    IR130                  AAST3E
   IRJ130                AMRTJNT


                                          A-24
                   SIPP USERS’ GUIDE VARIABLE CROSSWALK: 1993 TO 1996

          Ordered by 1996 File Position            Ordered by 1996 File Position
     1993                    1996             1993                   1996
SUSEQNUM                  SSUSEQ            ILUNCH               AHOTLUNC
    SUID                    SSUID          NKIDSHL                 RNKLUN
   PANEL                  SPANEL             LCHPT               EFREELUN
   WAVE                    SWAVE            ILCHPT               AFREELUN
     ROT                SROTATON          LCHFREE                EFRERDLN
  REFMTH                 SREFMON          ILCHFREE               AFRERDLN
  MONTH                 RHCALMN            BREAKF                 EBRKFST
    YEAR                 RHCALYR           IBREAKF                ABRKFST
   ADDID                 SHHADID           NKIDSBF                 RNKBRK
  HSTRAT                 GVARSTR            BFFREE               EFRERDBK
    HHSC                 GHLFSAM           IBFFREE               AFRERDBK
   SURGC                    GRGC             HEARN                 THEARN
 SUSTATE                  TFIPSST            FPROP                THPRPINC
  HSTATE                  TFIPSST            HPROP                THPRPINC
   H5MIS                EOUTCOME             HTRAN               THTRNINC
     HNF                    RHNF           HOTHER                THOTHINC
  HNFAM                   RHNFAM           HTOTINC               THTOTINC
    HNSF                   RHNSF           HNCASH                  RHNBRF
   H5REF                EHREFPER             HCASH                 RHCBRF
 HREFPER                EHREFPER           HMEANS                  RHMTRF
    H5NP                EHHNUMPP              HPOV                  THPOV
     HNP                EHHNUMPP          HNONCSH                THNONCSH
   HTYPE                  RHTYPE          HSOCSEC                THSOCSEC
   HWGT                 WHFNWGT               HSSI                  THSSI
  H5WGT                 WHFNWGT            HUNEMP                 THUNEMP
 HMETRO                   TMETRO             HVETS                 THVETS
   HMSA                     TMSA             HAFDC                 THAFDC
 HCHANGE                RHCHANGE            HFDSTP                THFDSTP
   HNSSR                  RHNSSR               FID                   RFID
 HACCESS                 EACCESS              FID2                  RFID2
  HUNITS                   EUNITS              FNP                   EFNP
  HLVQTR                 ELIVQRT           FREFPER                EFREFPER
 HTENURE                 ETENURE           FSPOUSE                EFSPOUSE
  HPUBHS                 EPUBHSE             FTYPE                 EFTYPE
  HLORNT                 EGVTRNT          FCHANGE                RFCHANGE
 IPHRENT                 AGVTRNT             FKIND                 EFKIND
  PHRENT                 TMTHRNT            FNKIDS                 RFNKIDS
    UTILS                EUTILYN          FOWNKID                RFOWNKID
   IUTILS                AUTILYN           FOKLT18                RFOKLT18
  HENRGY EEGYPMT1, EEGYPMT2, EEGYPMT3        FNSSR                 RFNSSR
 IHENRGY                 AEGYPMT             FWGT                WFFINWGT
 EASTAMT                 EEGYAMT             FEARN                 TFEARN
IEASTAMT                AEGYAMT              FTRAN                TFTRNINC
  LUNCH                 EHOTLUNC            FOTHER                TFOTHINC


                                     A-25
SIPP USERS’ GUIDE

          Ordered by 1996 File Position               Ordered by 1996 File Position
    1993                    1996                 1993                   1996
 FTOTINC                 TFTOTINC               IINAF                AAFNOW
    FPOV                   TFPOV              VETSTAT                EAFEVER
 FSOCSEC                 TFSOCSEC            IVETSTAT                AAFEVER
     FSSI                   TFSSI            USRVDT1                   UAF1
 FUNEMP                  TFUNEMP             USRVDT2                   UAF2
  FVETS                   TFVETS             USRVDT3                   UAF3
  FAFDC                   TFAFDC               VETTYP                EVETTYP
  FFDSTP                  TFFDSTP             IVETTYP                AVETTYP
     SID                    RSID              VETSMT                 EVAQUES
     SNP                   ESFNP                SPDAF                EAFSRVDI
 SREFPER                 ESFRFPER              ISPDAF               AAFSRVDI
 SSPOUSE                  ESFSPSE             FNLWGT                WPFINWGT
  STYPE                   ESFTYPE              P5WGT                WPFINWGT
  SKIND                   ESFKIND             FAMTYP                    ESFT
SCHANGE                 RSCHANGE                 AGE                   TAGE
SOWNKID                 ESOWNKID              FAMREL                   ERRP
 SOKLT18                 ESOKLT18                 RRP                  ERRP
   SWGT                 WSFINWGT                  MS                    EMS
  SEARN                  TSFEARN                 PNSP               EPNSPOUS
  SPROP                  TSPRPINC                PNPT           EPNMOM, EPNDAD
  STRAN                  TSTRNINC              PNGDU                EPNGUARD
 SOTHER                  TSOTHINC           DESGPNPT                RDESGPNT
 STOTINC                 TSTOTINC               EARN                  TPEARN
    SPOV                  TSFPOV                 PROP                TPPRPINC
 SSOCSEC                 TSSOCSEC               TRAN                 TPTRNINC
     SSSI                   TSSSI              OTHER                 TPOTHINC
  SVETS                   TSVETS               TOTINC                TPTOTINC
 SUNEMP                  TSUNEMP               SOCSEC               RCUTYP01
  SAFDC                   TSAFDC              SSPNUM                RCUOWN01
  SFDSTP                  TSFDSTP                VETS               RCUTYP08
  ENTRY                  EENTAID              VETNUM         RCUOWN8A, RCUOWN8B
   PNUM                  EPPPNUM                AFDC                RCUTYP20
  INTVW                 EPPINTVW            AFDCPNUM                RCUOWN20
 POPSTAT                EPOPSTAT             GENASST                RCUTYP21
 BRTHMN                   EBMNTH              GAPNUM                RCUOW21A
 BRTHYR                   TBYEAR              FOSTKID               RCUTYP23
     SEX                    ESEX              FKPNUM                RCUOWN23
    ISEX                    ASEX             OTHWELF                RCUTYP24
   RACE                    ERACE             OWPNUM                 RCUOW24A
   IRACE                   ARACE              WICCOV                RCUTYP25
ETHNCTY                   EORIGIN            WICPNUM                RCUOWN25
IETHNCTY                  AORIGIN           FOODSTMP                RCUTYP27
    EWID                 UEVRWID              FSPNUM                RCUOWN27
    INAF                  EAFNOW             CAIDCOV                RCUTYP57


                                          A-26
                    SIPP USERS’ GUIDE VARIABLE CROSSWALK: 1993 TO 1996

           Ordered by 1996 File Position              Ordered by 1996 File Position
      1993                   1996              1993                     1996
MCDPNUM                  RCUOWN57           TAKJOBN                  RNOTAKE
     HIIND               RCUTYP58              ESR                     RMESR
  HIPNUM          RCUOW58A, RCUOW58B          WESR1                  RWKESR1
  ENROLD       RENROLL, EENRLM, RENRLMA       WESR2                  RWKESR2
 IENROLD      ARENROLL, AENRLM, EENLEVEL      WESR3                  RWKESR3
    LEVEL                EENLEVEL             WESR4                  RWKESR4
   ILEVEL                AENLEVEL             WESR5                  RWKESR5
  EDASST                  EEDFUND             WKSJOB                RMWKWJB
 IEDASST                  AEDFUND           WKSWOP                  RMWKSAB
      PELL                 EASST01          IWKSWOP                  AWKSAB
 WKSTDY                    EASST03           WKSLOK                 RMWKLKG
   SUPPED                  EASST04          IWKSLOK                  AWKLKG
      NDSL                 EASST05            WS12002                  EENO1
  STLOAN                   EASST05            WS12003               ESTLEMP1
      PLUS                 EASST05            WS12016                TSJDATE1
    FSSHIP      EASST06, EASST08, EASST09     WS12018                TSJDATE1
  EMPLYR                   EASST10            WS12023                TEJDATE1
  OTHAID              EASST11, EASST07        WS12020                TEJDATE1
 IOTHVET                  AEDASST             WS12022                TEJDATE1
IWKSTDY                   AEDASST             WS12024                ERSEND1
     IPELL                AEDASST            IWS12024                ARSEND1
  ISUPPED                 AEDASST             WS12025                 EJBHRS1
     INDSL                AEDASST            UHOURS                   EJBHRS1
     IPLUS                AEDASST            IUHOURS                 AJBHRS1
 IEMPLYR                  AEDASST             WS12012                ECLWRK1
 IOTHAID                  AEDASST            IWS12012                ACLWRK1
   IFSSHIP                AEDASST             WS12044                EUNION1
 ISTLOAN                  AEDASST            IWS12044                AUNION1
 HIGRADE                 EEDUCATE             WS12046                ECNTRC1
IHIGRADE                 AEDUCATE            IWS12046                ACNTRC1
    SC1000               EPDJBTHN            WS1AMT                  TPMSUM1
   WEEKS                    EMAX              WS12026                EPAYHR1
    NJOBS                EJOBCNTR            IWS12026                APAYHR1
   RETIRD                 EEVERET           WS1CALC            APAYHR1, APYRATE1
  IRETIRD                 AEVERET             WS12028               TPYRATE1
     DISAB                EDISABL            IWS12028               APYRATE1
    IDISAB                ADISABL             WS12029                RPYPER1
  REASAB                    EABRE             WS1IND                  EJBIND1
 IREASAB                    AABRE            IWS1IND                  AJBIND1
    WKSPT                  EPTWRK             WS1OCC                 TJBOCC1
   IWKSPT                 APTWRK             IWS1OCC                 AJBOCC1
  WKSPTR                  EPTRESN             WS22102                  EENO2
 IWKSPTR                  APTRESN             WS22103               ESTLEMP2
  TAKJOB                  RTAKJOB             WS22118                TSJDATE2


                                       A-27
SIPP USERS’ GUIDE

          Ordered by 1996 File Position                Ordered by 1996 File Position
   1993                     1996                 1993                    1996
 WS22116                 TSJDATE2              SE12260                TBMSUM1
 WS22122                 TEJDATE2              SE1AMT                 TBMSUM1
 WS22120                 TEJDATE2             ISE1AMT                 ABMSUM1
 WS22123                 TEJDATE2             ISE12260                ABMSUM1
IWS22112                 AEJDATE2              SE12226                EPARTB11
 WS22124                 ERSEND2               SE12228                EPARTB21
IWS22124                 ARSEND2               SE12230                EPARTB31
 WS22125                  EJBHRS2               SE1IND                 TBSIND1
 WS22112                 ECLWRK2               ISE1IND                 ABSIND1
 WS22144                 EUNION2               SE1OCC                 TBSOCC1
IWS22144                 AUNION2              ISE1OCC                 ABSOCC1
 WS22146                 ECNTRC2               SE22301                  EBNO2
IWS22146                 ACNTRC2               SE22302               EBIZNOW2
WS2AMT                   TPMSUM2               SE22312                EHRSBS2
 WS22126                 EPAYHR2               SE22314                EGROSB2
WS2CALC            APAYHR2, APYRATE2          ISE22314                AGROSB2
IWS22126                 APAYHR2               SE22318                 TEMPB2
 WS22128                TPYRATE2              ISE22318                 AEMPB2
IWS22128                APYRATE2               SE22320                 EINCPB2
 WS22129                 RPYPER2              ISE22320                 AINCPB2
 WS2IND                   EJBIND2              SE22322                EPROPB2
IWS2IND                   AJBIND2             ISE22322                APROPB2
 WS2OCC                  TJBOCC2               SE22324                EHPRTB2
IWS2OCC                  AJBOCC2               SE22332                ESLRYB2
 SE12201                   EBNO1              ISE22332                ASLRYB2
 SE12202                EBIZNOW1               SE22334                 EOINCB2
 SE12212                 EHRSBS1              ISE22334                AOINCB2
 SE12214                 EGROSB1               SE22354                 TPRFTB2
 ISE12214                AGROSB1               SE22356                 TPRFTB2
 SE12218                  TEMPB1              ISE22354                APRFTB2
 ISE12218                 AEMPB1              ISE22356                APRFTB2
 SE12220                  EINCPB1              SE22360                TBMSUM2
 ISE12220                 AINCPB1              SE2AMT                 TBMSUM2
 SE12222                 EPROPB1              ISE2AMT                 ABMSUM2
 ISE12222                APROPB1              ISE22360                ABMSUM2
 SE12224                 EHPRTB1               SE22326                EPARTB12
 SE12232                 ESLRYB1               SE22328                EPARTB22
 ISE12232                ASLRYB1               SE22330                EPARTB32
 SE12234                 EOINCB1                SE2IND                 TBSIND2
 ISE12234                AOINCB1               ISE2IND                 ABSIND2
 SE12254                  TPRFTB1              SE2OCC                 TBSOCC2
 SE12256                  TPRFTB1             ISE2OCC                 ABSOCC2
 ISE12256                APRFTB1             SSICOVRG           ESSICHLD, ESSISELF
 ISE12254                APRFTB1               SOCSR1                 ERESNSS1


                                          A-28
                       SIPP USERS’ GUIDE VARIABLE CROSSWALK: 1993 TO 1996

        Ordered by 1996 File Position              Ordered by 1996 File Position
   1993                   1996                1993                   1996
 SOCSR2                ERESNSS2               IR32                   AR32
 DISAGE                 TAGESS                R34                    ER34
IDISAGE                 AAGESS                IR34                   AR34
  R01A                   ER01A                R35                    ER35
  IR01A                  AR01A                IR35                   AR35
  R01K                   ER01K                R36                    ER36
  IR01K                  AR01K                IR36                   AR36
  R02A                    ER02                R37                    ER37
  IR02A                   AR02                IR37                   AR37
   R03               ER03A, ER03K             R38                    ER38
   IR03              AR03A, AR03K             IR38                   AR38
   R05                    ER05                R50                    ER50
   IR05                   AR05                IR50                   AR50
   R07                    ER07                R51                    ER51
   IR07                   AR07                IR51                   AR51
   R08                    ER08                R52                    ER52
   IR08                   AR08                IR52                   AR52
   R10                    ER10                R53                    ER53
   IR10                   AR10                IR53                   AR53
   IR12                   AR12              CWORK                    ER55
   R12                    AR12                R55                    ER55
   R13                    ER13             ICWORK                    AR55
   IR13                   AR13                IR55                   AR55
   R20                    ER20              OTHINC                   ER56
   IR20                   AR20                R56                    ER56
   R21                    ER21                IR56                   AR56
   IR21                   AR21                R75              ER75, ER09, ER33
   R23                    ER23             S01AMTA                T01AMTA
   IR23                   AR23               IS01A                A01AMTA
   R24                    ER24             S01AMTK                T01AMTK
   IR24                   AR24               IS01K                A01AMTK
   R25                    ER25             S02AMTA                 T02AMT
   IR25                   AR25               IS02A                 A02AMT
   R27                    ER27              S03AMT          T03AMTA, T03AMTK
   IR27                   AR27                IS03          A03AMTA, A03AMTK
   R28                    ER28              S05AMT                 T05AMT
   IR28                   AR28                IS05                 A05AMT
   R29                    ER29              S07AMT                 T07AMT
   IR29                   AR29                IS07                 A07AMT
   R30                    ER30              S08AMT                 T08AMT
   IR30                   AR30                IS08                 A08AMT
   R31                    ER31              S10AMT                 T10AMT
   IR31                   AR31                IS10                 A10AMT
   R32                    ER32              S12AMT                 T12AMT


                                        A-29
SIPP USERS’ GUIDE

          Ordered by 1996 File Position                Ordered by 1996 File Position
   1993                     1996                  1993                   1996
   IS12                   A12AMT              S56AMT                   T56AMT
 S13AMT                   T13AMT                  IS56                 A56AMT
   IS13                   A13AMT              S75AMT                   T75AMT
 S20AMT                   T20AMT                  IS75                 A75AMT
 S21AMT                   A20AMT                 R103                  EAST2A
   IS20                   A20AMT                 IR103                 AAST2A
   IS21                   A21AMT                 R100                  EAST2B
 S23AMT                   T23AMT                 IR100                 AAST2B
   IS23                   A23AMT                 R101                  EAST2C
 S24AMT                   T24AMT                 IR101                 AAST2C
   IS24                   A24AMT                 R102                  EAST2D
 S27AMT                   T27AMT                 IR102                 AAST2D
   IS27                   A27AMT                 R110              EAST3A, EAST3B
 S28AMT                   T28AMT                 R106                  EAST3C
   IS28                   A28AMT                 IR106                 AAST3C
 S29AMT                   T29AMT                 R105                  EAST3D
   IS29                   A29AMT                 IR105                 AAST3D
 S30AMT                   T30AMT                 R130                  EAST3E
   IS30                   A30AMT                 IR130                 AAST3E
 S31AMT                   T31AMT                 R120                  EAST4A
   IS31                   A31AMT                 IR120                 AAST4A
 S32AMT                   T32AMT                 R140                  EAST4B
   IS32                   A32AMT                 IR140                 AAST4B
 S34AMT                   T34AMT                 R107                  EAST4C
   IS34                   A34AMT                 IR107                 AAST4C
 S35AMT                   T35AMT                 RJ120                EJNTRNT
   IS35                   A35AMT                IRJ120                AJNTRNT
 S36AMT                   T36AMT               JGRENT                  TJARNT
   IS36                   A36AMT              IJGRENT                  AJARNT
 S37AMT                   T37AMT               JNRENT                  TJACLR
   IS37                   A37AMT              IJNRENT                  AJACLR
 S38AMT                   T38AMT                RO120                EOWNRNT
   IS38                   A38AMT                IRO120               AOWNRNT
 S40AMT                   T39AMT              OGRENT                   TOARNT
 S50AMT                   T50AMT             IOGRENT                   AOARNT
   IS50                   A50AMT              ONRENT                   TOACLR
 S51AMT                   T51AMT             IONRENT                   AOACLR
   IS51                   A51AMT              RJ120OT                   EJRNT2
 S52AMT                   T52AMT             IRJ120OT                  AJRNT2
   IS52                   A52AMT                J120OT                 TJACLR2
 S53AMT                   T53AMT               IJ120OT                AJACLR2
   IS53                   A53AMT                 RJ130                EMRTJNT
 S55AMT                   T55AMT                IRJ130                AMRTJNT
   IS55                   A55AMT                  J130                 TMIJNT


                                          A-30
                         SIPP USERS’ GUIDE VARIABLE CROSSWALK: 1993 TO 1996

            Ordered by 1996 File Position               Ordered by 1996 File Position
      1993                    1996                 1993                   1996
      IJ130                 AMIJNT               HITYPE               EHIOWNER
    RO130                 EMRTOWN                HIOWN                EHIOWNER
    IRO130                AMRTOWN               IHITYPE               AHIOWNER
      O130                  TMIOWN              IHIOWN                AHIOWNER
     IO130                 AMIOWN                CHAMP                RCHAMPM
   O14050                  TRNDUP1                HISRC                EHEMPLY
   IO14050                 ARNDUP1               IHISRC                AHEMPLY
   RJ10003                   ESVJT                HIPAY                EHICOST
  IRJ10003                   ASVJT               IHIPAY                AHICOST
    J10003                 TSVJTINT             NONHHI                EHIOTHER
   IJ10003                 ASVJTINT            INONHHI                AHIOTHER
   CJ10003                 ASVJTINT            OTHVET                   EASST02
  RO10003                  ESVOAST                 R06                    ER06
 IRO10003                  ASVOAST                 IR06                   AR06
   O10003                   TSVOINT              GIBILL                   ER40
  CO10003                  ASVOINT                 R40                    ER40
   IO10003                 ASVOINT              IGIBILL                   AR40
      R104             EMDJT, EMDOAST              IR40                   AR40
     IR104            AMDJT, AMDOAST               R41                    ER41
    J10407                 TMDJTINT                IR41                   AR41
   IJ10407                 AMDJTINT                R54                    ER54
   CJ10407                 AMDJTINT                IR54                   AR54
   O10407                  TMDOINT             WICVAL                 EMTHAM25
   IO10407                 AMDOINT                 IS06                 A06AMT
  CO10407                  AMDOINT                 IS40                 A40AMT
    RO110                 EMANYCHK                 IS54                 A54AMT
     IR110                AMANYCHK                 R150                ERNDUP2
    IJO110                AMOWNDIV                IR150               EOTHPROP
   RJ110RI                EMOTHDIV
  RO110RI                 EMOTHDIV
  IJO110RI                AMOTHDIV
    J110RI                 TMJADIV
    IJ110RI                AMJADIV
    O110RI               TMOWNADV
   IO110RI               AMOWNADV
     RJ110                ESANYCHK
       J110                TSJNTDIV
      IJ110                ASJNTDIV
      O110                TSOWNDIV
     IO110                ASOWNDIV
 CARECOV                    ECRMTH
ICARECOV                    ACRMTH
MEDCODE                   RMEDCODE
  HINONH                  EHIOWNER


                                            A-31
B. SIPP Topcoding Specifications

Earnings
The topcoding of earnings amounts is based on the procedure used by the Current Population
Survey (CPS). Monthly amounts are topcoded if the wave amount is greater than one-third of the
annual earnings benchmark of $150,000. The Survey of Income and Program Participation
(SIPP) uses the benchmark of $150,000 set by CPS to “annualize” the topcoding procedure. SIPP
topcodes on a monthly basis (reporting level) for amounts exceeding $12,500 (1/12 of $150,000)
if the wave amount is greater than $50,000 (1/3 of $150,000). The topcoded amounts are defined
once for the Panel based on Wave 1 edited data.

Three variables require topcoding:

!   EPM(1-4)SUM—wage and salary earnings,
!   EBM(1-4)SUM—self-employed earnings,
!   EMLM(1-4)SUM—earnings from additional jobs and moonlighting.
To compute the topcodes, the Census Bureau tallies all amounts that require topcoding based on
the above criteria into a 12-cell matrix. The cells are based on sex, race/ethnic origin, and full-
time/part-time worker definition. When all values have been tallied, a mean is computed for each
cell based on the total amount divided by total number of occurrences. Those means will be used
for the entire 1996 Panel with an adjustment for inflation and real growth in earned income of
1.019% per wave for all remaining waves in the panel.


Topcoding Earnings for the 1996 SIPP Panel

If the sum of the monthly earnings amounts for a job for the wave is greater than $50,000, then
those monthly amounts that are greater than $12,500 are topcoded. After matching on sex,
race/ethnic origin, and labor force status, the Census Bureau uses the topcode amounts from the
topcoding matrix for earnings. See Table B-1 for examples of income amounts that need to be
topcoded.


                                               B-1
SIPP USERS’ GUIDE


            Table B-1. Examples of Income Amounts That Need to Be Topcoded

                     Monthly Income Amounts                         Is the Sum
                                                          Sum       Greater
                                                          for the   Than         Topcoding
Example    Month 1     Month 2    Month 3      Month 4    Wave      $50,000?     Procedure
1         $3,000      $4,000     $5,000       $5,000      $17,000   No           None
2         $0          $0         $0           $55,000     $55,000   Yes          Topcode month 4
                                                                                 with the mean
3         $15,000     $15,000    $10,000      $12,000     $52,000   Yes          Topcode months 1
                                                                                 and 2 with the
                                                                                 mean
4         $12,000     $12,000    $12,000      $15,000     $51,000   Yes          Topcode month 4
                                                                                 with the mean
5         $0          $0         $0           $49,000     $49,000   No           None
6         $15,000     $15,000    $15,000      $15,000     $60,000   Yes          Topcode all 4
                                                                                 months with the
                                                                                 mean


Specification of the Matrix for Calculating the
Means for Earnings

The mean values are created by summing the reported monthly amounts that are greater than
$12,500 and dividing by the total number of inputs to the cell.

For cells with fewer than six amounts, create a mean value by summing all values for those cells
with fewer than six amounts and dividing by the total number of inputs to the cells. Matrix
definition: 2 × 3 × 2 matrix for sex, race, and labor force status


Sex

Use the edited variable ESEX with the following values:

       ESEX: 1 = Male
              2 = Female


Race

Set the index RACORIG, using the edited ERACE and EORIGIN, as described below:


                                              B-2
                                                        SIPP TOPCODING SPECIFICATIONS

Create the index variable RACORIG, defined as follows:

       RACORIG:       1 = Nonblack, non-Hispanic
                      2 = Black, non-Hispanic
                      3 = Hispanic, any race

IF (EORIGIN = 20 - 28)       THEN RACORIG = 3
ELSE IF (ERACE = 2)          THEN RACORIG = 2
ELSE                         THEN RACORIG = 1


Labor Force Status

Set the index FTFULYR, which will define a worker as a full-time, full-year or a full-time, not
full-year worker.

       FTFULYR:
              1 = Yes, full-time, full-year worker
              2 = No, not full-time, full-year worker

IF (RM1ESR = 1 AND RM2ESR = 1 AND RM3ESR = 1 AND RM4ESR = 1) AND

(the number of variables in the EHRSWK01 - EHRSWK(EMAX) array that equal 1 is greater
than EMAX/2)

       THEN FTFULYR = 1 (YES)
       ELSE FTFULYR = 2 (NO)


Filling the Matrix to Create the Means for Topcoding

Perform the following calculations in the order shown:

!   Sum the four monthly amounts reported for EPM1SUM, EPM2SUM, EPM3SUM, and
    EPM4SUM. If the sum is greater than $50,000, then store the amounts greater than $12,500
    in the appropriate cell in the matrix (matched on ESEX, RACORIG, FTFULYR).
!   Sum the four monthly amounts reported for EBM1SUM, EBM2SUM, EBM3SUM, and
    EBM4SUM. If the sum is greater than $50,000, then store the amounts greater than $12,500
    in the appropriate cell in the matrix (matched on ESEX, RACORIG, FTFULYR).


                                               B-3
SIPP USERS’ GUIDE

!   Sum the four monthly amounts reported for EMLM1SUM, EMLM2SUM, EMLM3SUM,
    and EMLM4SUM. If the sum is greater than $50,000, then store the amounts greater than
    $12,500 in the appropriate cell in the matrix (matched on ESEX, RACORIG, FTFULYR).
!   Sum the values in each cell and divide by the number of inputs to the cell for the mean
    amount for the cell.
!   For cells with fewer than six inputs, create the mean by combining all of the amounts from
    each of the cells and dividing by the total number of inputs to the cells. Use this mean for all
    cells with zero to six entries.


                                       Table B-2. Earnings Topcodes

Sex                        Race                                   Worker Status                        Topcode
Sex = 1 (Male)             Nonblack, non-Hispanic                 Full year, full time                 $29,660
Sex = 1 (Male)             Nonblack, non-Hispanic                 Not full year, full time             $38,270
Sex = 1 (Male)             Black, non-Hispanic                    Full year, full time                 $17,530
Sex = 1 (Male)             Black, non-Hispanic                    Not full year, full time             $24,015
Sex = 1 (Male)             Hispanic, any race                     Full year, full time                 $26,250
Sex = 1 (Male)             Hispanic, any race                     Not full year, full time             $24,015
Sex = 2 (Female)           Nonblack, non-Hispanic                 Full year, full time                 $21,990
Sex = 2 (Female)           Nonblack, non-Hispanic                 Not full year, full time             $49,450
Sex = 2 (Female)           Black, non-Hispanic                    Full year, full time                 $24,015
Sex = 2 (Female)           Black, non-Hispanic                    Not full year, full time             $24,015
Sex = 2 (Female)           Hispanic, any race                     Full year, full time                 $24,015
Sex = 2 (Female)           Hispanic, any race                     Not full year, full time             $24,015
Note: The topcodes listed above for each cell are greater than the monthly value that is tested, $12,500. This topcode
is the mean of all amounts greater than $12,500. The intention is to reveal as much information as possible by using
the mean value.


Year of Birth (TBYEAR)
Year of birth is bottomcoded to 1912 to ensure that age does not exceed 88 during the panel. If
year of birth (EBYEAR) is earlier than 1912, set year of birth to 1912. Age must be recalculated
based on the new year of birth.


Age (TAGE)
Age is topcoded to 88 for the entire panel. TAGE is topcoded through birth year (EBYEAR),
which is bottomcoded to 1912, and then age is recalculated.


                                                        B-4
                                                      SIPP TOPCODING SPECIFICATIONS


Age at Receipt of Social Security Disability
Benefits (TAGESS)
EAGESS is age at which person began receiving Social Security Disability benefits.

If EAGESS is greater than TAGE, set TAGESS equal to the topcoded value for age (88).

If EAGESS GT TAGE             THEN TAGESS = TAGE


Age Respondent Started Job or Business
(TSJDATE, TEJDATE, TSBDATE, TEBDATE)
ESJDATE is date respondent started job.

EEJDATE is date respondent ended job.

ESBDATE is date respondent started business.

EEBDATE is date respondent ended business

A respondent cannot be over 88 years old during the life of the panel. Therefore, year of birth is
bottomcoded to 1912. A respondent cannot have “worked” or “owned a business” before age 14
years. The earliest a respondent can be shown beginning or ending a job or business is 1926
(1912 + 14). If the date in ESJDATE, EEJDATE, ESBDATE, or EEBDATE is earlier than 1926,
set the date to 1926 (exclude values equal to –1).

After bottomcoding the year to 1926, check the month and day fields to ensure that the end date
is after the start date for the job or business and then switch the dates as follows:

For Jobs:
       If EEJDATE is less than ESJDATE
              Then ESJDATE = EEJDATE
                    EEJDATE = ESJDATE
For Businesses:
       If EEBDATE is less than ESBDATE
              Then ESBDATE = EEBDATE
                    EEBDATE = ESBDATE


                                               B-5
SIPP USERS’ GUIDE

                 Table B-3. 1996 Panel Topcoding Specifications

     PUF        MONTHLY       Bottom-
     Variable   Topcode at:   code       Short Description
 1   TBDJTINT   $2,500        NA         Assets: Amount of monthly interest on joint municipal-
                                         corporate bonds
 2   TBDOINT    $3,200        NA         Assets: Amount of monthly interest on self-owned
                                         municipal-corporate bonds
 3   TCDJTINT   $450          NA         Assets: Amount of monthly interest on joint certificates of
                                         deposit
 4   TCDOINT    $825          NA         Assets: Amount of monthly interest on solely owned
                                         certificates of deposit
 5   TCKJTINT   $55           NA         Assets: Amount of monthly interest from joint checking
                                         account
 6   TCKOINT    $110          NA         Assets: Amount of monthly interest on solely owned
                                         checking account
 7   TGVJTINT   $550          NA         Assets: Amount of monthly interest on joint U.S.
                                         government securities
 8   TGVOINT    $1,725        NA         Assets: Amount of monthly interest on self-owned U.S.
                                         government securities
 9   TJACLR     $1,375        ($1,000)   Assets: Amount of net rent from property owned jointly with
                                         spouse
10   TJACLR2    $6,000        ($1,000)   Assets: Amount of net income from rental property with
                                         others
11   TJARNT     $2,725        NA         Assets: Amount of gross rent from property owned jointly
                                         with spouse
12   TMDJTINT   $275          NA         Assets: Amount of monthly interest on joint money market
                                         account
13   TMDOINT    $550          NA         Assets: Amount of monthly interest on self-owned money
                                         market deposit account
14   TMIJNT     $1,775        NA         Assets: Amount of interest on mortgage owned with spouse
15   TMIOWN     $1,650        NA         Assets: Amount of interest on own mortgage
16   TMJADIV    $700          NA         Assets: Amount of dividend credited to joint margin
                                         account/reinvestment in mutual funds
17   TMJNTDIV   $1,100        NA         Assets: Amount of check for jointly own mutual funds
18   TMOWNADV   $1,825        NA         Assets: Amount of dividend credited to sole margin
                                         account/reinvestment in mutual funds
19   TMOWNDIV   $1,375        NA         Assets: Amount of check for solely owned mutual funds
20   TOACLR     $2,450        ($1,250)   Assets: Amount of net income from own rental property
21   TOARNT     $4,350        NA         Assets: Amount of gross rent from own property
22   TRNDUP1    $3,300        NA         Assets: Amount of income from royalties
23   TRNDUP2    $4,750        ($1,250)   Assets: Amount of other income from financial investments
24   TSJADIV    $825          NA         Assets: Amount of dividend credited to margin
                                         account/reinvestment in stocks owned jointly
25   TSJNTDIV   $775          NA         Assets: Amount of dividend check for jointly owned stocks
26   TSOWNADV   $1,375        NA         Assets: Amount of monthly dividend credited margin
                                         account/reinvestment in stock
27   TSOWNDIV   $1,150        NA         Assets: Amount of dividend check for solely owned stocks
28   TSVJTINT   $150          NA         Assets: Amount of monthly interest on joint savings account.
                                                                                    (table continues)


                                         B-6
                                                         SIPP TOPCODING SPECIFICATIONS

                 Table B-3. 1996 Panel Topcoding Specifications (continued)

     PUF             MONTHLY          Bottom-
     Variable        Topcode at:      code       Short Description
29   TSVOINT         $175             NA         Assets: Amount of monthly interest on self-only savings
                                                 account
30   TCSAGY(M)       NA               NA         GenInc: Amount received by agency on your behalf
31   T28AMT          $1,200           NA         GenInc: Amount of child support payments
32   T29AMT          $3,275           NA         GenInc: Amount of alimony payments
33   T30AMT          $2,500           NA         GenInc: Amount of pension from a company or union
34   T31AMT          $3,925           NA         GenInc: Amount from federal civil service or other federal
                                                 civilian employee pension
35   T32AMT          $3,825           NA         GenInc: Amount of U.S. military retirement pay
36   T34AMT          $3,270           NA         GenInc: Amount of state government pension
37   T35AMT          $3,600           NA         GenInc: Amount of local government pension
38   T36AMT          $2,200           NA         GenInc: Amount of income from a paid-up life insurance
                                                 policy or annuity
39   T37AMT          $5,000           NA         GenInc: Amount from estates or trusts
40   T38AMT          $2,600           NA         GenInc: Amount of payments for retirement, disability, or as
                                                 a survivor benefit
41   T39AMT          $110,000         NA         GenInc: Amount of payments for pension/retirement lump
                                                 sums
42   T42AMT          $13,625          NA         GenInc: Amount of draw from an IRA/Keough/401k or
                                                 Thrift Plan
43   T50AMT          $75              NA         GenInc: Amount of income assistance from a charitable
                                                 group
44   T51AMT          $10,900          NA         GenInc: Amount of money from relatives or friends
45   T52AMT          $325             NA         GenInc: Amount of lump-sum payments
46   T53AMT          $1,960           NA         GenInc: Amount of income from roomers or boarders
47   T55AMT          $3,500           NA         GenInc: Amount of incidental or casual earnings
48   T56AMT          $21,800          NA         GenInc: Amount of miscellaneous cash income
49   TBM(M)SUM1/2    See Spec No. 1   NA         Business: Income received this month
50   TPM(M)SUM1/2    See Spec No. 1   NA         Job: Earnings from job received in MONTH1
51   TMLM(M)SUM      See Spec No. 1   NA         LabFor: Amount of income from this work (moonlighting)
                                                 this month
52   TBYEAR          See Spec No. 2   NA         Person: Birth year
53   TAGE            See Spec No. 3   NA         Person: Age as of last birthday
54   TAGESS          See Spec No. 4   NA         GenInc: Age Social Security Disability receipt began
55   TSJDATE         See Spec No. 5   NA         Job: Date started this job
56   TEJDATE         See Spec No. 5   NA         Job: Date ended this job
57   TSBDATE         See Spec No. 5   NA         Business: Date started operating this business
58   TEBDATE         See Spec No. 5   NA         Business: Date ended operating this business
59   TPYRATE         $30              NA         Job: Regular hourly pay rate
60   TPRFTB          $17,450          ($2,500)   Business: Net profit or loss
61   TROLLAMT        $999,000         NA         GenInc: Amount rolled over into a retirement account during
                                                 the reference period
62   TMTHRNT(M)      $650             NA         Household: Amount of monthly rent


                                                 B-7
C. Computing the SIPP Sampling
   Weights
This appendix supplements the discussion in Chapter 8 (Using Sampling Weights on SIPP Files)
with more detailed information about how the core wave file person-level weight FNLWGT and
the full panel file person-level weights FNLWGT_x and PNLWGT are computed;1 it is intended
as a reference for users who require a comprehensive description of how the sampling weights
are computed.

Sections 1 and 2 of this appendix discuss the algorithms that are used to compute the final core
wave file person-level weights FNLWGT, with the first section discussing the Wave 1 weights
and the second section discussing the Wave 2+ weights. The third section discusses the
algorithm that computes the final full panel weights FNLWGT_x (the calendar year weight for
year x) and PNLWGT (the panel weight).


Wave 1 Weights
For the 1996 Panel, the final weights used in deriving estimates consist of the product of four
factors: the base weight, the duplication control factor, the household noninterview adjustment
factor, and the second-stage adjustment factor. For panels prior to 1996, these four factors may
have been multiplied by two other factors—the first-stage ratio estimate factor and the new
construction noninterview adjustment factor—which are discussed later in this chapter.


Base Weight (BW)

The primary component of the sampling weight is the base weight. The base weight for any
sampled person or sampled household is the reciprocal of the probability under the sample
design of that person or household being selected. If there was full response and if there were no
calibration adjustments, then the summation of base weights for a particular subgroup (e.g.,
Hispanics in the Southwest) is an unbiased estimator of the total U.S. population within that
subgroup. In simplified terms, a base weight of 1,000 assigned to a sampled person means that
the sampled person “represents” 1,000 people in the U.S. population. The base weight for a

1
 The remaining weights given in Table 12-2 (HWGT, FWGT, SWGT, P5WGT, H5WGT, and FINALWGT) are
derived directly from the basic person-level weight FNLWGT. This derivation is discussed in the “How Weights
Are Constructed” subsection of Chapter 8.


                                                   C-1
SIPP USERS’ GUIDE


household and the base weight for a person within a household are the same, since every person
within a sampled household is automatically selected (i.e., selected with a conditional probability
of 1, given household selection).


Duplication Control Factor (DCF)

The duplication control factor, an integer value between 1 and 4 inclusive, is applied to the base
weights of specified households to account for subsampling done in clusters of housing units
selected at the last stage of sample selection. These clusters typically contain an unmanageable
number of housing units. When this occurs, a sampling fraction, 1/N, is determined by selecting
a value of N such that the number of sample households in the cluster is reduced to a manageable
size. After this is done, a duplication control factor of N or 4, whichever is smaller, is included as
a weighting factor for sampled housing units in the cluster.


Household Noninterview Adjustment Factor (NAF)

The noninterview adjustment factor is intended to adjust for the presence of Type A
noninterview households (households that are not interviewed because the occupants were
temporarily absent, no one was home, the occupants refused participation, or the occupants could
not be located). Noninterview adjustment factors are computed for each of a set of noninterview
cells. These cells are based on 512 cells generated from all possible cross-classifications of the
following household characteristics (256 cells for panels prior to 1996):

!   Within-PSU oversampling strata: poverty stratum and nonpoverty stratum (only for 1996
    and later panels);
!   Census region;
!   Race of reference person: black or nonblack;
!   Tenure: owner or renter;
!   Residence status: MSA urban, MSA nonurban, NonMSA Census place, or NonMSA not
    Census place; and
!   Household size: one, two, three, or four or more persons.
Any cells with fewer than 30 interviewed households or with noninterview adjustment factors
exceeding 2.0 are collapsed with a neighboring cell. To define cells as neighboring, the Census
Bureau uses a sort order and scale values based on estimates of the 1979 poverty rate within the
cell. The total number of noninterview cells is less than or equal to 512 for the 1996 Panel (256
or fewer for the earlier panels). In pre-1996 Panels, no cells were collapsed across the four cells
defined by the cross-classification of race of reference person and tenure. For the 1996 Panel, no


                                                C-2
                                                   COMPUTING THE SIPP SAMPLING WEIGHTS


cells are collapsed over the cross-cells defined by race of reference person, tenure, within-PSU
oversampling strata, and Census region.

Within each final noninterview cell c, the formula for the noninterview adjustment factor (NAFc)
is
                               sum of BW * DCF over all sampled households in cell c
                   NAFc =                                                              .                    (C-1)
                             sum of BW * DCF over all interviewed households in cell c

This factor is applied to the weight of each interviewed household in the cell; with these
noninterview-adjusted weights, the interviewed households in each cell can be seen to
“represent” themselves and also the Type A noninterviewed households in the cell.2


Wave 1 Second-Stage Calibration Adjustment (SSCA)

For the second-stage calibration adjustments, the Census Bureau uses tallies of Current
Population Survey (CPS) weights for independent population controls. The CPS weights are
calibrated to match population controls provided by the population division of the Census
Bureau and then a “March type” adjustment is done to equalize the weights of husbands and
wives. Because the population division does not produce family-type controls, SIPP family-type
controls are in fact CPS sample estimates. SIPP controls for age, sex, and race, on the other hand,
should not differ appreciably from the original population division controls.

The primary steps in the calibration (or ratio estimation) process are the attaching of second-
stage calibration adjustment factors to the pre-second-stage weights (BW*DCF*NAF) within
particular cells (e.g., male Hispanic 14-year-olds) so that the resulting adjusted weights
(BW*DCF*NAF*SSCA) aggregate to independent CPS-derived population estimates within the
cell. The summation of the pre-second-stage weights within any cell are unbiased estimates
(assuming the nonresponse adjustment successfully adjusts for all effects of nonresponse) of the
population totals (e.g., the summation of BW*DCF*NAF over all male Hispanic 14-year-olds in
the panel is an unbiased estimate of the total number of male Hispanic 14-years-olds in the U.S.
population).

For SIPP, the monthly CPS estimates of the population totals in these cells are generally superior
to the aggregations of nonresponse-adjusted SIPP weights (superior in the sense of having lower
sampling and/or nonsampling error). The adjusted weights (BW*DCF*NAF*SSCA) give
estimates then for these cells that are equal to the independent estimates. This adjustment
generally improves the overall precision of all estimates of these cells or any other related survey
characteristics that are prevalent in these cells.


2
 In pre-1996 Panels, group quarters housing units were not included in the nonresponse computations, and received
nonresponse adjustments equal to 1. Group quarters housing units are treated as other households in the 1996 Panel.


                                                       C-3
SIPP USERS’ GUIDE


The population cells for which adjustments are made to independent estimates are given in
Figures C-1, C-2, and C-3 (see pages C-6–C-11). The cells include (as can be seen in the figures)
age, race, sex, Spanish origin, family relationship, and household type. As noted earlier, the
independently derived estimates for these cells are based on CPS March supplement-type
estimates, except the estimates for family type. (The CPS estimates are not the usual CPS
monthly estimates. [See U.S. Census Bureau (1998) for more details.] The estimates are
specially computed for this purpose by summing the CPS weights within a given cell for all
sample units in the relevant CPS sample [there are some extra steps also, such as the equalization
of husbands’ and wives’ CPS weights, which are not generally part of the CPS estimation
process]).


Outline of the Second-Stage Calibration Algorithm

The second-stage calibration algorithm uses as its inputs the pre-second-stage weights
BW*DCF*NAF computed for each sampled person represented on a completed questionnaire in
a SIPP panel.3 These weights are run through a series of adjustments, which result in a final
weight (FNLWGT).4 This final weight can be written as FNLWGT = SSCA*BW*DCF*NAF,
with SSCA (the second-stage calibration adjustment) equal to the ratio of the pre-second-stage
weight and the final weight after the calibration process is completed.

This algorithm can be segmented into five major steps5:

1. Calibration of Hispanic children weights;
2. Calibration of non-Hispanic children weights;
3. Initial calibration steps for all adults;
4. Calibration of Hispanic adults; and
5. Calibration of non-Hispanic adults.
Each of these steps consists of numerous substeps. The next two sections describe certain steps
that are common to all of the steps in the algorithm (the ratio adjustment step, the raking step, the
cell-collapsing step, and the computation of control totals), the third section discusses details of


3
  Children do not answer any SIPP questionnaires, but any children who are indicated as dependents by a sampled
household receive weights in this process.
4
  In pre-1996 Panels, households with all adults categorized as military personnel were interviewed and assigned
weights (except for households in barracks, which are ineligible for SIPP). These households were not included in
the second-stage calibration process (as they are not eligible for CPS and are not included in the CPS-derived
control totals), and they received final weights equal to their pre-second-stage weights. For the 1996 Panel, these
households are assigned as ineligible households and are not included in the weighting at all.
5
  Separate runs of the calibration algorithm are made for each reference month and each rotation group (a total of 16
calibration runs for each panel wave).


                                                        C-4
                                             COMPUTING THE SIPP SAMPLING WEIGHTS


particular calibration steps, and the last section describes steps that were carried out only for pre-
1996 Panels.


Ratio Adjustments, Raking, and Cell Collapsing

The most important steps in the algorithm are the ratio adjustment and raking steps. Each ratio
adjustment step takes all of the person weights (as they are at that point in the algorithm) within
particular second-stage cells and multiplies them by a common ratio adjustment factor. The
common factor is chosen for the second-stage cell so that the summation of the adjusted person
weights within the cell equals the control total for that second-stage cell. The common ratio
adjustment factor for each cell is equal to the control total divided by the summation of the
current person weights for all sample persons in the cell.

The raking step is similar to the ratio adjustment step except that there are two sets of second-
stage cells, with separate control totals (one set of second-stage cells is called the “row
dimension,” and the other set is called the “column dimension”). At the end of the raking process
(also called iterative proportional fitting), each person weight (as it is at that point in the
algorithm) has been adjusted so that all person weights aggregate to the appropriate control totals
for both the row cells and the column cells. The adjusted person weights have the property of
aggregating within the second-stage cells to each control total while remaining as “close as
possible” (in terms of a particular algebraic distance function) to the person weight values at the
beginning of the raking step. Thus, the new person weights are consistent with both sets of
independent control totals and have been altered as little as possible from the person weights
before the step.

Most of the ratio adjustment and raking steps are preceded by a cell-collapsing step. This step is
designed to prevent extreme alterations in the person weights (which will increase variability of
the estimators) in any of the ratio adjustment and raking steps. Each second-stage cell is checked
in its sample size: if the sample size is less than 35, then the cell is collapsed with a neighboring
cell. The second-stage cells are also checked by computing the ratio adjustment for that cell. If
that adjustment is less than 0.67 or greater than 2.0, then the cell is collapsed with a neighboring
cell.

Ratio adjustments are computed for each set of second-stage cells before the raking process is
performed. Ratio adjustments are computed for the row cells and the column cells as if only a
ratio adjustment were being done for the row cells alone or the column cells alone, rather than a
full raking step. If the computed ratio adjustments for any of the row cells are less than 0.67 or
greater than 2.0, or the sample size for any row cell is less than 35, then the row cell is collapsed
with a neighboring row cell. The same process is carried out for the column cells. All collapsing
of this kind is completed before the raking step is executed.

When a second-stage cell is designated as requiring collapsing during the cell-collapsing step,
the neighboring cell is chosen through a predetermined mechanism. Hispanic second-stage cells
(see Figure C-1) are collapsed by sex (e.g., Hispanic males 15–24 are collapsed with Hispanic


                                                C-5
SIPP USERS’ GUIDE


females 15–24). The same is true for the household status second-stage cells for non-Hispanic
children (the column dimension for non-Hispanic children; see Figure C-2). For the household
status second-stage cells for adults (the column dimension for adults; see Figure C-3, pp. C-8
through C-11), the following pairs are collapsed when collapsing is necessary (the numbers in
parentheses are the column numbers in the Figure C-3 tables):6

!   Spouse in primary family (1); spouse in subfamily (3).
!   Householder, no spouse present, in household with family (2); householder in household
    without a family (5).
!   Not a spouse in household with family (4); not a householder in household without family (6).
For the age status second stage for adults (the row dimension for adults: see Figure C-3),
neighboring cells are found on the basis of the scale value (which is given for the 1996 Panel in
Figure C-3). The cell with the scale value closest to that of the cell that requires collapsing
becomes the neighboring cell used in collapsing.

                              Figure C-1. Second-Stage Cells for Hispanics

                  Second-stage cells for Hispanic children

                  Male           Female


                  Second-stage cells for Hispanic adults7

                  Male                                        Female
                  15–24          25–44          45+           15–24          25–44          45+


                  Second-stage cells for unmarried Hispanic adults

                  Male           Female


6
  Collapsing is never done across black and nonblack status, or across sex, but only within the four primary groups:
black males and females, and nonblack males and females (see Figure C-3).
7
  Hispanic adults in the military are not defined as Hispanics in the computation of control totals or in the calculation
of second-stage adjustments.


                                                         C-6
                                                COMPUTING THE SIPP SAMPLING WEIGHTS


                   Figure C-2. Second-Stage Cells for Non-Hispanic Children
Second-Stage Cells for Black Children (14 years of age and younger)

                       Children                                                 Children
            Children   Not in                                        Children   Not in
MALES       in Family  Family                            FEMALES in Family      Family
Age (years) Households Households SCALE                  Age (years) Households Households SCALE
Under 2                           15                     Under 2                           15
2 to 3                                 17                2 to 3                          17
4 to 5                                 25                4 to 5                          25
6 to 7                                 27                6 to 7                          27
8 to 9                                 45                8 to 9                          45
10 to 11                               47                10 to 11                        47
12 to 13                               55                12 to 13                        55
14                                     57                14                              57


Second-Stage Cells for Nonblack Children (14 years of age and under)


                       Children                                                 Children
            Children   Not in                                        Children   Not in
MALES       in Family  Family                            FEMALES in Family      Family
Age (years) Households Households SCALE                  Age (years) Households Households SCALE
Under 1                                 15               Under 1                          15
1                                       17               1                                17
2                                       25               2                                25
3                                       27               3                                27
4                                       45               4                                45
5                                       47               5                                47
6                                       55               6                                55
7                                       57               7                                57
8                                       75               8                                75
9                                       77               9                                77
10 to 11                                85               10 to 11                         85
12 to 13                               105               12 to 13                        105
14                                     107               14                              107


                                                   C-7
SIPP USERS’ GUIDE


                     Figure C-3. Second-Stage Cells for Non-Hispanic Adults
Second-Stage Cells for Black Males (15+ years of age)

                                                                      Persons Not in Households
          Persons in Households That Contain a Primary Family         Containing a Primary Family or
          or Subfamily                                                Subfamily
          Husband of Male House-      Other Household Members                    Not a Householder
Age       Primary     holder, No      Husband of Not a                House-     or Person in Group SCALE
(years)   Family      Spouse Present Subfamily     Husband            holder     Quarters               VALUE
15                                                                                                      15
16–17                                                                                                   16
18–19                                                                                                   18
20–21                                                                                                   27
22–24                                                                                                   29
25–29                                                                                                   47
30–34                                                                                                   49
35–39                                                                                                   57
40–44                                                                                                   59
45–49                                                                                                   63
50–54                                                                                                   65
55–59                                                                                                   83
60–64                                                                                                   85
65–69                                                                                                   93
70+                                                                                                     95
                                                                                              (figure continues)

The cell-collapsing procedure in some cases requires more than one iteration if cells after
collapsing to the nearest neighbor are still too small or show extreme ratio adjustments (this
generally occurs only in row-dimension collapsing for adults). New scale values are computed
for the collapsed cells and are used to designate neighboring cells for any further collapsing that
is necessary.


Computation of Control Totals

The control totals are equal to the CPS March-type estimates within each second-stage cell for
some of the earlier ratio adjustment and raking steps in the algorithm.8 For the remaining ratio
adjustment and raking steps, the control totals are derived by taking the CPS March-type
estimate within the second-stage cell and subtracting from this the adjusted weights of any


8
 For the 1984 and 1985 Panels, the control totals excluded people illegally residing in the United States. For the
1986 Panel and all panels following, the people are included in the control totals.


                                                      C-8
                                                COMPUTING THE SIPP SAMPLING WEIGHTS


             Figure C-3. Second-Stage Cells for Non-Hispanic Adults (continued)
Second-Stage Cells for Black Females (15+ years of age)

                                                                Persons Not in Households
          Persons in Households That Contain a Primary Family   Containing a Primary Family or
          or Subfamily                                          Subfamily
          Wife of     Female House- Other Household Members                Not a Householder
Age       Primary     holder, No      Wife of                   House-     or Person in Group SCALE
(years)   Family      Spouse Present Subfamily     Not a Wife   holder     Quarters               VALUE
15                                                                                               15
16-17                                                                                            16
18-19                                                                                            18
20-21                                                                                            27
22-24                                                                                            29
25-29                                                                                            47
30-34                                                                                            49
35-39                                                                                            57
40-44                                                                                            59
45-49                                                                                            63
50-54                                                                                            65
55-59                                                                                            83
60-64                                                                                            85
65-69                                                                                            93
70-74                                                                                            94
75+                                                                                              96
                                                                                        (figure continues)

subgroups whose weights have been completed. For example, control totals are derived for non-
Hispanic children by taking the CPS March-type estimates for all children in each row cell and
column cell (see Figure C-2) and subtracting the adjusted weights of all SIPP panel-rotation-
group Hispanic children within that cell.

Details of the Calibration Steps

The first step (for Hispanic children) is a direct ratio adjustment to CPS control totals (using only
two cells defined by sex). The second step (for non-Hispanic children) is a raking adjustment to
derived controls; for row cells and column cells, the second-stage cells given in Figure C-2 are
used. The derived control totals for each second-stage cell are equal to CPS control totals for all
children in the cell minus the adjusted weights of all sampled Hispanic children in the cell.


                                                   C-9
SIPP USERS’ GUIDE


             Figure C-3. Second-Stage Cells for Non-Hispanic Adults (continued)
Second-Stage Cells for Nonblack Males (15+ years of age)

                                                                Persons Not in Households
          Persons in Households That Contain a Primary Family   Containing a Primary Family or
          or Subfamily                                          Subfamily
          Husband of Male House-      Other Household Members              Not a Householder
Age       Primary     holder, No      Husband of Not a          House-     or Person in Group SCALE
(years)   Family      Spouse Present Subfamily     Husband      holder     Quarters              VALUE
15                                                                                                 15
16–17                                                                                              16
18–19                                                                                              18
20–21                                                                                              27
22–24                                                                                              29
25–29                                                                                              47
30–34                                                                                              49
35–39                                                                                              57
40–44                                                                                              59
45–49                                                                                              63
50–54                                                                                              65
55–59                                                                                              83
60–64                                                                                              85
65–69                                                                                              93
70–74                                                                                              95
75–79                                                                                             103
80–84                                                                                             104
85+                                                                                               106
                                                                                         (figure continues)

Following the steps for children (which complete all second-stage adjustments for the children’s
weights) are the initial calibration steps for adults. Those steps are as follows:

1. A raking adjustment to CPS control totals that uses the Figure C-3 second-stage cells (the
   input weights are the pre-second-stage weights of all sampled adults);
2. A direct ratio adjustment to CPS control totals for sampled Hispanic adults; the input weights
   are the adjusted weights from step 1, and the second-stage cells are the cells given in Figure
   C-3 (for adults);
3. An equalization of all husbands’ weights to their wives’ weights (so that spouses in one
   family have equal weights);
4. A second raking adjustment identical to step 1 except that the input weights are the adjusted
   weights after steps 1 through 3 are completed;
5. A second Hispanic adult ratio adjustment identical to step 2 except that the input weights are
   the Hispanic adult adjusted weights from step 4.


                                                  C-10
                                               COMPUTING THE SIPP SAMPLING WEIGHTS


             Figure C-3. Second-Stage Cells for Non-Hispanic Adults (continued)
Second-Stage Cells for Nonblack Females (15+ years of age)

                                                                Persons Not in Households
          Persons in Households That Contain a Primary Family   Containing a Primary Family or
          or Subfamily                                          Subfamily
          Wife of     Female House- Other Household Members                Not a Householder
Age       Primary     holder, No      Wife of                   House-     or Person in Group SCALE
(years)   Family      Spouse Present Subfamily     Not a Wife   holder     Quarters            VALUE
15                                                                                               15
16–17                                                                                            16
18–19                                                                                            18
20–21                                                                                            27
22–24                                                                                            29
25–29                                                                                            47
30–34                                                                                            49
35–39                                                                                            57
40–44                                                                                            59
45–49                                                                                            63
50–54                                                                                            65
55–59                                                                                            83
60–64                                                                                            85
65–69                                                                                            93
70–74                                                                                            95
75–79                                                                                           103
80–84                                                                                           104
85+                                                                                             106


The next two steps complete the weights for Hispanic adults. The first step is an equalization of
all husbands’ weights in married couples, including at least one Hispanic, to their wives’
weights. The exception to this is when the wife is not Hispanic, in which case the wife’s weight
is set equal to the husband’s weight. At this point, all married couples including at least one
Hispanic have their final weights. The second step is a ratio adjustment for sampled unmarried
Hispanics (only males and females are used as second-stage cells) to derived control totals,
which are CPS control totals for all Hispanic adults minus the adjusted weights of the sampled
married Hispanics.


                                                  C-11
SIPP USERS’ GUIDE


The last steps complete the calibration process for sampled non-Hispanic adult weights. Those
steps are as follows:

6. An equalization of wives’ weights to their husbands’ weights.
7. A raking adjustment to derived control totals that uses the Figure C-3 second-stage cells (the
   input weights are the current adjusted weights of all non-Hispanic adults). The control totals
   are the CPS control totals for all adults for the second-stage cells minus the adjusted weights
   of Hispanic adults within those cells.
8. An equalization of husbands’ weights to their wives’ weights. This step finalizes the weights
   for all non-Hispanic females and all non-Hispanic husbands.
9. A raking adjustment to derived control totals; the Figure C-3 second-stage cells for adult
   males (with the two husband columns deleted) are used, and the current adjusted weights of
   all non-Hispanic nonhusband males are used. The derived control totals are the CPS control
   totals minus the adjusted weights of all groups who have had their weights completed. This
   step produces the final weights for all non-Hispanic nonhusband male adults (the last group
   without completed weights).

Weighting Factors Used in Panels Prior to 1996

In all panels prior to the 1996 Panel, a first-stage ratio estimate factor (FSF) was applied to the
base weight of each person in non-self-representing PSUs (i.e., PSUs not sampled with
certainty). This first-stage factor was a ratio adjustment step that used as cells Census region,
residence status, and race; it was designed to reduce the variance resulting from sampling of
PSUs. Although this factor is no longer computed in the 1996 Panel, the cells are now used in the
computation of noninterview adjustment factors.

Also, beginning with the 1985 Panel, a new construction noninterview adjustment factor (NCF)
was applied to the base weight of new households in new construction housing-unit clusters.
This factor was used to account for newly constructed housing units that were selected for the
sample but were unavailable for interviewing. It was set equal to 1 in the 1986–1993 Panels (it
was not used in the 1984 Panel), and eventually it was discontinued.

Thus, in the 1984 Panel, FNLWGT was equal to BW*DCF*HNF*FSF*SSCA (excludes NCF).
FNLWGT was equal to BW*DCF*NCF*HNF*FSF*SSCA in the 1985–1993 Panels.


Wave 2+ Weights
The later wave cross-sectional weight is computed separately for each reference month of each
wave. This Wave 2+ FNLWGT has the following factors for people in households whose
residents have not changed from Wave 1: an initial weight (IW), a later wave noninterview


                                              C-12
                                                   COMPUTING THE SIPP SAMPLING WEIGHTS


adjustment (LWNIA), and a second-stage calibration adjustment (SSCA). The initial weight is
generally equal to the pre-second-stage weight for the Wave 1 household weight (with some
exceptions). For households that have had people move into or out of the household after Wave
1, there is an adjustment to the initial weight called the mover’s weight (MW). For these people,
the cross-sectional weight has as factors the mover’s weight, the later wave noninterview
adjustment, and the second-stage calibration adjustment. In summary, people in households that
do not need mover’s adjustments receive the cross-sectional weight FNLWGT =
IW*LWNIA*SSCA, and persons in households that do require a mover’s adjustment receive the
Wave 2+ final weight FNLWGT = MW*LWNIA*SSCA.


Wave 2+ Initial Weights

The initial weight is essentially the pre-second-stage Wave 1 weight, that is, IW =
BW*DCF*NAF.9 The second-stage calibration adjustment for the Wave 1 reference months is
not included as a factor: the second-stage calibration adjustment is redone using control totals
current for the later wave reference months. The initial weight allows the original sample person
to represent unsampled persons in the population and persons in households who were not
successfully interviewed in Wave 1. The initial weight does not generally change from wave to
wave after Wave 1, unless special circumstances arise that cause an alteration in the panel
sample (such as a cut in the sample for budgetary or other reasons).


Movers’ Weights

People in any households that an original sample person enters during later waves, or any people
who become part of a Wave 1 sample household during later waves, also become part of the
sample for those waves. If the original sample person moves away from the household
containing those people, the additional people immediately drop from the sample (their in-
sample status in any given wave is entirely dependent on the presence of original sample persons
in the household). Any of the additional people who were part of the SIPP population in Wave 1
(and therefore could have been sampled) and who become members of households with original
sample persons are called associated sample persons. If any of these additional persons were not
part of the SIPP population in Wave 1 (because they were out of the country, institutionalized,
etc.), then they are called additional sample persons.

9
  The 1985 Panel had an initial weight that was computed differently. The initial weight for this panel included a
new-construction noninterview adjustment factor and a first-stage ratio estimate factor. The Wave 1 noninterview
adjustment factor was also recomputed in the 1985 Panel to account for sampled households mistakenly left off the
sample roster during Wave 1, and sampled households that were noncooperative in Wave 1 but were converted
during Wave 2. There was also an added “sample cut” factor, adjusting for sampled households that were deselected
because of a reduction in the 1985 Panel sample. Pre-1996 Panels following 1985 had only one difference from the
1996 Panel initial weight described in the text: the presence of the first-stage ratio estimate factor.


                                                     C-13
SIPP USERS’ GUIDE


Any household that consists of people who were in the SIPP universe who lived in separate
households during the Wave 1 reference period (with at least one of the households sampled in
Wave 1) is called an enhanced household. In most cases, an enhanced household consists of
original sample persons from a Wave 1 sample household and associated sample persons from a
household (or households) not sampled in Wave 1. In a few rare cases, an enhanced household
will contain original sample persons from more than one Wave 1 sample household. Those
households are rare because the probability of selection of any given household in SIPP is quite
small, making the joint probability of a later wave merged household having two or more of its
Wave 1 predecessor households selected in Wave 1 quite small (but the situation does occur in
the SIPP panels).

Enhanced households require an adjustment of the Wave 1 base weight for each person in the
household. These people in effect had multiple chances of being in the selected enhanced
household: they could have been selected as original sample persons in the household they were
in during Wave 1 (which then became an enhanced household), or they could become an
associated sample person if their Wave 1 household was not selected but merged later with a
sampled Wave 1 household. Their true probability of being included in the enhanced household
is higher than their nominal Wave 1 probability of selection, and their assigned base weight
should be the reciprocal of this true sample inclusion probability.

This true inclusion probability is not computed directly, for it requires the computation of joint
probabilities of selection of multiple households, some of which were not in the original Wave 1
household sample. Instead, a “mover’s weight” is assigned to each original and associated
sample person in the enhanced household, which has as its expectation the inverse of the true
sample inclusion probability. In other words, the movers’ weights are unbiased weights, taking
into account the complex realized sample design for enhanced households.

In the case in which an enhanced household is formed from only one Wave 1 sample household
(with associated persons added to it), the mover’s weight for each person in the household
(original, associated, or additional) is computed as follows for reference month t, enhanced
household i:

                                                   W1i S1ti
                                          Wti =              ,                                (C-2)
                                                  Sti − Stai

where W1i is the initial weight that is common to all original sample persons in the ith enhanced
household, S1ti is the number of original sample persons in the ith enhanced household in month
t, Sti is the size of the ith enhanced household in month t (all persons), and Stai is the number of
additional sample persons in the ith enhanced household in month t. The numerator of this
expression is the sum of the initial weights over all original sample persons in the household
during month t, and the denominator of this expression is the number of original and associated
sample persons in the ith enhanced household in month t. For a discussion of why these are
unbiased weights, see, for example, Kalton and Brick (1994).


                                               C-14
                                                     COMPUTING THE SIPP SAMPLING WEIGHTS


When two Wave 1 sample households merge, the mover’s weight for each sample person
(original, associated, or additional) in the household is computed as follows:

                                                    W S + W1′i S1ti
                                                                  ′
                                               Wti = 1i 1ti         .                                            (C-3)
                                                       Sti − Stai

The two terms in the numerator are for the first and second Wave 1 sample households. The
movers’ weights for more than two merged Wave 1 sample households are computed
analogously.


Wave 2+ Later Wave Noninterview Adjustments

The initial weights have an adjustment for noncooperation in Wave 1; that is, the sample
households with nonzero initial weights represent households for which an interview was not
completed in Wave 1. There are, however, further losses of sample households in later waves for
several reasons:

!    The household refuses to cooperate in some or all of the later waves.
!    The people in the household have moved and cannot be found.
!    The household has moved, and has been found, but is too far away for a personal interview
     and cannot be reached by telephone. 10
The weights of households for which later wave interviews are completed are adjusted to
“represent” sample households (who cooperated in Wave 1) whose interviews are not completed
for any of the above reasons. Those adjustments are computed by assigning each sample
household with a nonzero initial weight to one of 109 later wave noninterview cells.11 The
noninterview cells are based on the following household characteristics:

 1. Reference person is a non-Hispanic white person, or other (two categories).
 2. Reference person is a female householder without a spouse and with her own children, a
    householder 65 years of age or older, or other (three categories).
 3. Household income includes welfare payments (AFDC, WIC, Food Stamps, Medicaid, or
    other welfare), or not (two categories).
 4. Household size is 1, 2, 3, or 4 or more persons (four categories).
 5. Household has some bond-type financial assets, or not (two categories).

10
   The SIPP sample is designed so that most of the field work takes place within the SIPP PSUs, to reduce traveling
costs. If a household moves too far away from the field areas, a telephone interview is attempted.
11
   In pre-1996 Panels, 53 noninterview cells were used, based on the first 7 of the 10 listed household characteristics.


                                                        C-15
SIPP USERS’ GUIDE


 6. Reference person’s education level is less than 8 years, 8 to 11 years, 12 to 15 years, or 16 or
    more years (four categories).
 7. Household owns housing unit, is renter, or is living in a public housing project or receiving a
    rent subsidy from the government (three categories).
 8. Census division (nine categories).
 9. Number of imputations in household Wave 1 questionnaire is none, 1, or more than 1 (three
    categories).
10. Household income as a percentage of the household poverty threshold (with both averaged
    over 4 reference months): less than or equal to 175 percent, 176 through 450 percent, and
    more than 450 percent (three categories).
These categories have been found in empirical research to be consistently heterogeneous in later
wave noninterview rates (i.e., the categories have divergent noninterview rates). The later wave
noninterview adjustment for each noninterview cell is equal to the sum of the initial or mover’s
weights of all households that have had the later wave interview completed, divided by the sum
of the initial or mover’s weights of all Wave 1 sample households.12 (The mover’s weight is
used whenever a mover’s weight is computed for the household.) These adjustments are made
separately for each reference month of each later wave of the panel.

Before the final noninterview adjustment is computed for each wave, each noninterview cell is
checked. Any noninterview cell with fewer than 30 interviewed households, or with a
noninterview adjustment greater than 2, is collapsed with a neighboring cell. Cells are defined as
neighboring on the basis of a set of scale values assigned to each noninterview cell. This
procedure prevents extreme noninterview adjustments from being made (which will increase
sampling variability). The final noninterview adjustment (LWNIA) for the cell, or collapsed cell,
is assigned to each household within the cell.

Table C-1 presents the major groupings of noninterview cells (the noninterview cells within
these major groupings have similar scale values and would be collapsed together within these
groupings before any collapsing was done across groupings).


Wave 2+ Second-Stage Calibration Adjustment (SSCA)

A second-stage calibration adjustment is carried out for each reference month in each later wave,
for each rotation group of the panel separately. This adjustment uses the same algorithm as
described for Wave 1 weights, with new CPS or CPS-derived control totals computed for each


12
  In pre-1996 Panels, general quarters households were not included in these calculations and receive noninterview
adjustments equal to 1. In the 1996 Panel, these households are treated in the same way as family households in
noninterview calculations, but households with only military adults were included.


                                                     C-16
                                                COMPUTING THE SIPP SAMPLING WEIGHTS


                          Table C-1. Major Groupings of Later Wave
                                      Noninterview Cells


                                                                Number of
                   Household Characteristics                    Nonresponse Cells
                   Hispanic or nonwhite
                         Minimal assets                                15
                         Assets include bonds                           9
                   White Non-Hispanic
                         Single female householder                      1
                         Householder 65 and older                      14
                         Other householder
                              No welfare income
                                    One person in household            20
                                    Two people in household            14
                                    Three people in household           7
                                    Four or more in household          19
                              Has welfare income                       10
                   Total                                              109


new reference month. The pre-second-stage weights in this case are IW*LWNIA, or
MW*LWNIA if a mover’s weight was computed for the household. The second-stage calibration
adjustments reduce sampling variability by calibrating the final weights to agree with
independent control totals. With the later wave cross-sectional weights, the second-stage
calibration adjustments also have the effect of reducing biases from population undercoverage
(arising from eligible people entering the U.S. population after the Wave 1 reference months).


Calendar Year and Panel Weights
The algorithm for generating the calendar year and panel weights is very similar to that used for
computing Wave 2+ weights, with some differences. The most important differences are the
following:

!   A control date is associated with each calendar year and panel weight (rather than the weight
    being associated with a month, as with the Wave 1 and Wave 2+ weights).
!   For a sample person to have a nonzero weight, data must be present for the sequence of
    months defined for the weight (12 months for the calendar year weights and all months of the
    panel for the panel weights). Months for which the sample person is ineligible are excluded
    from this check.


                                                  C-17
SIPP USERS’ GUIDE


Calendar Year and Panel Initial Weights

The initial weight computed for each sample person for all calendar year and panel weights is
IW = BW*DCF*NAF, that is, the same quantity that is used as the initial weight for all Wave 2+
weights. This initial weight allows each original sample person who has interviews for the
months for which they are eligible in the calendar year (or panel) to represent unsampled people
in the population and people in households that were not successfully interviewed in Wave 1.


Calendar Year and Panel Noninterview Adjustments

The noninterview adjustments for each calendar year and panel weight are computed by first
assigning each sampled person with a nonzero initial weight to one of 149 noninterview cells.13
These noninterview cells are based on the following person-level characteristics:

 1. Person is a non-Hispanic white person, or other (two categories).
 2. Person was self-employed, or not (two categories).
 3. Family income was a percentage of the family poverty threshold (with both averaged over 4
    reference months): less than or equal to 175 percent, 176 through 450 percent, and more
    than 450 percent (three categories).14
 4. Person in household whose income includes welfare payments (SSI, AFDC, WIC, Food
    Stamps, Medicaid, or other welfare), person receiving unemployment compensation but not
    in household with welfare payments, or neither (three categories).
 5. Person in household with some bond-type financial assets, or not (two categories).
 6. Person’s education level is less than 12 years, 12 to 15 years inclusive, or 16 or more years
    (three categories).
 7. Person was in labor force at least 1 month of wave, or not (two categories).
 8. Census division of household (nine categories).
 9. Number of imputations in household Wave 1 questionnaire is none, 1, or more than 1 (three
    categories).
10. Within PSU, stratum code of household is poverty stratum or nonpoverty stratum (two
    categories).


13
 In pre-1996 Panels, 126 noninterview cells were used, based on the first 7 of the 10 listed person characteristics.
14
 In pre-1996 Panels, household income (averaged over 4 reference months) was used instead: less than $1,200 a
month, between $1,200 and $4,000 a month, and greater than or equal to $4,000 a month.


                                                      C-18
                                                   COMPUTING THE SIPP SAMPLING WEIGHTS


These categories have been found in empirical research to be consistently heterogeneous in later
wave noninterview rates. The noninterview adjustment for the noninterview cell (for the
particular calendar year [panel] weight) is equal to the sum of the initial weights of all sampled
persons whose households were interviewed in Wave 1,15 divided by the sum of the initial
weights of all sampled persons who have interviews for every month of the calendar year (panel)
in which they are eligible.16

As with other noninterview adjustments discussed in this appendix, each noninterview cell is
checked for small sample sizes and extreme noninterview adjustments. Any noninterview cell
with fewer than 30 sampled persons with complete interview strings, or with a calendar year
(panel) noninterview adjustment greater than 2, is collapsed with a neighboring cell for that
calendar year and panel weight. If necessary, this process can be iterative: a cell may be
collapsed into another cell, and then the combined cell may be collapsed further with other cells.
A set of scale values determines how cells are collapsed when collapsing is necessary. Table C-2
presents the major groupings of noninterview cells (i.e., the noninterview cells with similar scale
values). The noninterview cells within these groupings would be collapsed together among
themselves before any collapsing would be done outside of these groupings.

                       Table C-2. Major Groupings of Calendar Year (Panel)
                                       Noninterview Cells


                                                                        Number of
                      Person Characteristics                            Nonresponse Cells
                      Hispanic or nonwhite                                     50
                      White Non-Hispanic
                            Less than 12 years of education                      25
                            12 to 15 years of education
                                 In labor force                                 32
                                 Not in labor force                             18
                            16 or more years of education                       24
                      Total                                                    149


15
   People who entered the sample during or after the calendar year (panel) period (by entering a sampled household)
are excluded from these calculations (and receive calendar year [panel] weights of zero). Children who move
without their parents (into nonsampled households) during the period are also excluded from these computations and
receive calendar year (panel) weights of zero.
16
   In pre-1996 Panels, sample persons living in group quarters are not included in these noninterview adjustments,
and those people are given noninterview adjustments equal to 1 (when their calendar year and panel weights are
nonzero). In the 1996 Panel, sample persons living in group quarters are treated in the same way as other sample
persons.


                                                      C-19
SIPP USERS’ GUIDE


Calendar Year and Panel Second-Stage Adjustments

The calendar year and panel weights that have been computed up to this point (called the pre-
second-stage weights) for each sampled person (with a complete set of interviews for their
eligible months) are equal to BW*DCF*NAF*LWNIA. The formula for the final calendar year
weights (FNLWGT) is BW*DCF*NAF*LWNIA*SSCA, where SSCA is the second-stage
calibration adjustment. The final panel weight follows the same formula: PNLWGT =
BW*DCF*NAF*LWNIA*SSCA, though LWNIA and SSCA are computed differently here. The
final weight is computed in both cases from the pre-second-stage weights
BW*DCF*NAF*LWNIA in accordance with the algorithm described below. As with the Wave
1 and Wave 2+ weights, the algorithm for second-stage adjustment for calendar year and panel
weights can be segmented into the following five major steps:

1. Calibration of Hispanic children weights;
2. Calibration of non-Hispanic children weights;
3. Initial calibration steps for all adults;
4. Calibration of Hispanic adults; and
5. Calibration of non-Hispanic adults.
However, the actual steps within these five major steps are different in their details for calendar
year (panel) weights. The primary difference between the calendar year (panel) weights second-
stage calibration algorithm and the Wave 2+ weights second-stage calibration algorithm is that a
married couple weighting equalization is not done for the calendar year (panel) weights, and
married and unmarried persons are not separated out for separate calibration steps in the calendar
year (panel) weights algorithm.

The independent estimates for the control month are the same CPS March supplement-type
estimates that were used for the Wave 2+ weights, except they are computed for different
second-stage cells when used for calendar year (panel) weights. The second-stage cells for
calendar year (panel) weights are given in Figures C-4, C-5, and C-6. The second-stage
calibration algorithm is run separately for each rotation group, with the control totals for each
rotation group equal to one-quarter of the CPS control totals.


                                               C-20
                                                 COMPUTING THE SIPP SAMPLING WEIGHTS


               Figure C-4. Calendar Year and Panel Weight Second-Stage Cells
                                       for Hispanics
                Second-Stage Cells for Hispanics (14 years and younger)


                Male          Female


                Second-Stage Cells for Hispanics (15+ years of age)17


                Male                                     Female
                15–24         25–44        45+           15–24          25–44      45+


               Figure C-5. Calendar Year and Panel Weight Second-Stage Cells
                                 for Non-Hispanic Children
                Cells for Children (14 years and younger)


                              Nonblack     Nonblack      Black          Black
                Age           Males        Females       Males          Females     SCALE
                Under 2                                                             15
                2 to 3                                                              17
                4 to 5                                                              25
                6 to 7                                                              27
                8 to 9                                                              45
                10 to 11                                                            47
                12 to 13                                                            55
                14                                                                  57


17
   Hispanic adults in the military are not defined as Hispanics in the computation of control totals or in the
calculation of second-stage adjustments.


                                                   C-21
SIPP USERS’ GUIDE


Figure C-6. Calendar Year and Panel Weight Second-Stage Cells for Non-Hispanic Adults
1996 Panel Second-Stage Cells for Nonblack Females (15+ years of age)

                           Householder                                  Not Householder

        1. Female
        Householder 2. Other    3. Other                  6. Spouse of
        No Spouse Female        Female      4. Female     Householder 7. Other  9. Other
        Present     Householder Householder Householder or Spouse      Female   Female Not
Age     with Own    No Spouse Living with Not Living of Related Related to Related to SCALE
(years) Children    Present     Relative    with Relative Subfamily Householder Householder VALUE
15                                                                                             15
16–17                                                                                          16
18–19                                                                                          18
20–21                                                                                          27
22–24                                                                                          29
25–29                                                                                          47
30–34                                                                                          49
35–39                                                                                          57
40–44                                                                                          59
45–49                                                                                          63
50–54                                                                                          65
55–59                                                                                          73
60–61                                                                                          74
62–64                                                                                          76
65–69                                                                                          93
70–74                                                                                          95
75–79                                                                                         103
80–84                                                                                         104
85+                                                                                           106
                                                                                     (figure continues)


Details of the Calendar Year and Panel Second-Stage
Calibration Steps

The individual steps in the calendar year (panel) second-stage calibration algorithm are generally
the same as the corresponding steps in the Wave 1 and Wave 2+ second-stage calibration


                                                  C-22
                                                     COMPUTING THE SIPP SAMPLING WEIGHTS


              Figure C-6. Calendar Year and Panel Weight Second-Stage Cells for
                               Non-Hispanic Adults (continued)
      1996 Panel Second-Stage Cells for Black Females (15+ years of age)

                              Householder                              Not Householder

                          3. Other                  6. Spouse of
              2. Female   Female      4. Female     Householder 7. Other    9. Other
              Householder Householder Householder or Spouse of Female       Female Not
      Age     No Spouse Living with Not Living Related           Related to Related to SCALE
      (years) Present     Relative    with Relative Subfamily Householder Householder VALUE
      15                                                                                   15
      16–17                                                                                16
      18–19                                                                                18
      20–21                                                                                27
      22–24                                                                                29
      25–29                                                                                47
      30–34                                                                                49
      35–39                                                                                57
      40–44                                                                                59
      45–49                                                                                63
      50–54                                                                                65
      55–59                                                                                73
      60–61                                                                                74
      62–64                                                                                76
      65–69                                                                                93
      70–74                                                                                94
      75+                                                                                  96
                                                                                 (figure continues)

algorithm.18 The differences in the two calibration algorithms are primarily the second-stage
cells, with some other minor differences, as described in this section.

The first step (for Hispanic children) is a ratio adjustment to CPS control totals that uses only the
two cells defined by sex (this step is identical to the Wave 1 and Wave 2+ algorithm step for
Hispanic children). The second step (for non-Hispanic children) is a ratio adjustment step to
derived controls that uses as cells the second-stage cells given in Figure C-5.


18
   The cell-collapsing procedures described for the Wave 1 and Wave 2+ weights are also used as stated in that
section for the calendar year and panel weights, except for the column dimension collapsing for non-Hispanic adults.
For calendar year and panel weights, and for any of the four race/sex groups given in Figure C-6, columns 1 and 2
(see Figure C-6 for the numbering of the columns) are collapsed if either does not meet the criterion (which is the
same as described in the earlier section on ratio adjustment, raking, and cell collapsing), column 4 is collapsed with
column 2 if it does not meet the criterion, column 7 is collapsed with column 9 if either does not meet the criterion,
and column 8 is collapsed with column 10. Collapsing of columns 3, 5, and 6 and further collapsing of the other
columns should never be necessary in practice.


                                                       C-23
SIPP USERS’ GUIDE


            Figure C-6. Calendar Year and Panel Weight Second-Stage Cells for
                             Non-Hispanic Adults (continued)
      1996 Panel Second-Stage Cells for Nonblack Males (15+ years of age)

                      Householder                           Not Householder

                                             6. Spouse of
               3. Male       5. Male         Householder                   10. Other
               Householder   Householder     or Spouse of    8. Other Male Male Not
       Age     Living with   Not Living      Related         Related to    Related to    SCALE
       (years) Relative      with Relative   Subfamily       Householder Householder     VALUE
       15                                                                                215
       16–17                                                                             216
       18–19                                                                             218
       20–21                                                                             227
       22–24                                                                             229
       25–29                                                                             247
       30–34                                                                             249
       35–39                                                                             257
       40–44                                                                             259
       45–49                                                                             263
       50–54                                                                             265
       55–59                                                                             273
       60–61                                                                             274
       62–64                                                                             276
       65–69                                                                             293
       70–74                                                                             295
       75–79                                                                             303
       80–84                                                                             304
       85+                                                                               306
                                                                                 (figure continues)


Following these steps for children (which complete all second-stage adjustments for the
children’s weights) are the initial calibration steps for adults. Those steps are as follows:

1. A raking adjustment to CPS control totals that uses the Figure C-6 second-stage cells; the
   input weights are the pre-second-stage weights of all sampled adults.
2. A direct ratio adjustment to CPS control totals for sampled Hispanic adults; the input weights
   are the adjusted weights from step 1, and the second-stage cells are the cells given in Figure
   C-4 (for adults).
3. A second raking adjustment identical to step 1 except that the input weights are the adjusted
   weights after steps 1 and 2 are completed.


                                                 C-24
                                                COMPUTING THE SIPP SAMPLING WEIGHTS


            Figure C-6. Calendar Year and Panel Weight Second-Stage Cells for
                             Non-Hispanic Adults (continued)
       1996 Panel Second-Stage Cells for Black Males (15+ years of age)

                       Householder                           Not Householder
                                              6. Spouse of
               3. Male        5. Male         Householder                   10. Other
               Householder    Householder     or Spouse of    8. Other Male Male Not
       Age     Living with    Not Living      Related         Related to    Related to   SCALE
       (years) Relative       with Relative   Subfamily       Householder Householder    VALUE
       15                                                                                 215
       16–17                                                                              216
       18–19                                                                              218
       20–21                                                                              227
       22–24                                                                              229
       25–29                                                                              247
       30–34                                                                              249
       35–39                                                                              257
       40–44                                                                              259
       45–49                                                                              263
       50–54                                                                              265
       55–59                                                                              273
       60–61                                                                              274
       62–64                                                                              276
       65–69                                                                              293
       70+                                                                                295


4. A second Hispanic adult ratio adjustment identical to step 2 except that the input weights are
   the Hispanic adult adjusted weights from step 3.
At this point, the weights are completed for Hispanic adults. The final step is a raking adjustment
to derived control totals that uses the Figure C-6 second-stage cells. The derived control totals
are the CPS control totals for all adults for the second-stage cells minus the adjusted weights of
Hispanic adults within those cells. The input weights are the current adjusted weights for non-
Hispanic adults.


                                                  C-25
D. Acronyms
ADL      =   Activities of Daily Living

AFDC     =   Aid to Families with Dependent Children

ASA      =   American Statistical Association

BLS      =   Bureau of Labor Statistics

BW       =   base weight

CAI      =   computer-assisted interviewing

CAPI     =   computer-assisted personal interviewing

CMSA     =   Consolidated Metropolitan Statistical Area

CPS      =   Current Population Survey

DADS     =   Data Access and Dissemination System

DCF      =   duplication control factor

DES      =   Data Extraction System

EDs      =   enumeration districts

FERRET   =   Federal Electronic Research Review and Extraction Tool

FHNSP    =   female with no spouse present living with relatives

GA       =   General Assistance

GVFs     =   generalized variance functions

ICPSR    =   Inter-university Consortium for Political and Social Research

ISDP     =   Income Survey Development Program

MSA      =   Metropolitan Statistical Area

NAF      =   noninterview adjustment factor


                                     D-1
SIPP USERS’ GUIDE


NCF        =    new-construction noninterview adjustment factor

NCHS       =    National Center for Health Statistics

NLS        =    National Longitudinal Surveys

NSR PSUs   =    non-self-representing PSUs

OASDI      =    Old-Age, Survivors, and Disability Insurance

OMB        =    Office of Management and Budget

PRWORA     =    Personal Responsibility and Work Opportunity Reconciliation Act

PSID       =    Panel Study of Income Dynamics

PSU        =    primary sampling units

SIPP       =    Survey of Income and Program Participation

SPD        =    Survey of Program Dynamics

SRS        =    simple random sample

SSCA       =    second-stage calibration adjustment

SSI        =    Supplemental Security Income

TANF       =        Temporary Assistance for Needy Families

WIC        =    Women, Infants, and Children nutrition program


                                        D-2
E. Glossary

A
address unit

This collection unit is a person or group of persons living at the same address at the time of the
interview. The address unit may consist of one person living by himself or herself, a group of
unrelated individuals, or one or more families.

allocation flag

See imputation flag.


B

C
CAI (computer-assisted interviewing)

A method of interviewing in which a computer is used as the data collection instrument.

CAPI (computer-assisted personal interviewing)

A method of interviewing in which field representatives use a laptop computer to collect data
during in-person interviews. In SIPP, the field representatives also periodically use the laptop
computers during telephone interviews conducted from their homes.

cold-deck matrix

The matrix of starting values that constitutes the first step in the hot-deck imputation procedure.
The matrix values can be determined a priori from information external to the current file being
processed or can be determined from reported information from the current file.


                                               E-1
SIPP USERS’ GUIDE


control card

In the paper instrument for SIPP, a mechanism for carrying demographic and case management
information forward from one wave to the next for each sample member.

core content

Questions asked at every SIPP interview. They cover demographic characteristics, work
experience, earnings, program participation, transfer income, and asset income.

core wave files

Files containing the core data from one wave of interviews.

cross-sectional

Pertaining to data collected for a single time period from a representative sample. In SIPP hot-
deck imputation procedures, cross-sectional refers to current-wave data.

Current Population Survey (CPS)

A labor force survey sponsored jointly by the Census Bureau and the Bureau of Labor Statistics
that is used to compute the government’s official monthly unemployment statistics along with
other estimates of labor force characteristics.


D
data dictionary

Contains information about the file structure and the names, locations, and contents of all
variables in a microdata file.

data editing

The use of related information to replace missing or inconsistent data in the survey.

departure noninterview

This type of noninterview occurs when someone was a member of a SIPP interviewed household
during the 4-month reference period but was no longer a household member on the date of the
interview.


                                               E-2
                                                                                     GLOSSARY


E

F
family

Two or more people who are living together and are related by blood, marriage, or adoption.

FERRET

An on-line data access tool available on the SIPP Web site. SIPP data are available on FERRET
beginning with the 1992 longitudinal panel.

following rules

SIPP rules that guide which original sample members continue to be interviewed should they
move.

full panel files

Files containing all data for every person who was a member of a SIPP panel at any time during
the life of that panel.


G
general income

Any type of income except earnings and asset income.

geographic (GRIN) codes

Codes that identify where each sample household is located and permit linkage to a file that
contains a full set of geographic codes for different kinds of areas. This level of geography is not
available on the public use files.

group quarters

Noninstitutional living quarters, such as rooming and boarding houses, college dormitories,
convents, and monasteries. These do not constitute households and are often treated differently
from households.


                                               E-3
SIPP USERS’ GUIDE


H
hot-deck matrix

The matrix used in all but the first stage of hot-deck imputation. As cold-deck values are
replaced with information from the current wave, the resulting array of cells constitutes the hot-
deck matrix.

hot-deck procedure

The statistical method used to impute items missing from the core questionnaire and topical
modules. This procedure replaces missing item data in a wave with nonmissing values from
similar interviewed cases. The imputation method can be a purely cross-sectional procedure of
locating donors from the current file on the basis of characteristics reported in this wave, or it can
be a longitudinal procedure of locating donors from the prior wave on the basis of characteristics
reported at that earlier time for items missing in the current wave.

household

People living in a housing unit at the time of the interview. SIPP infers households from the
interviews conducted at each address.

household-level noninterviews

See household nonresponse.

household nonresponse

Nonresponse that occurs when the interviewer either cannot locate a household or cannot
interview any of its adult members. See Type A, Type B, Type C, and Type D noninterviews.

household reference person

See reference person.

housing unit

Living quarters with its own entrance and cooking facilities.


                                                E-4
                                                                               GLOSSARY


I
imputation

The most common method for handling missing data in SIPP. Imputation replaces missing
values with statistical estimates that are based on the best relevant information available.

imputation flag

An imputation flag is associated with each core questionnaire item subject to statistical
imputation and indicates whether information has been imputed.

in-sample variables

See monthly interview status variables.

in scope

Being part of the survey universe.

interview month

The month during which the interview takes place.

item nonresponse

A source of missing data that occurs when a respondent does not answer one or more questions,
even though most of the questionnaire is completed.

J

K

L
logical imputation

See data editing.


                                             E-5
SIPP USERS’ GUIDE


longitudinal

Pertaining to data collected at different times over an extended period from a representative
sample. In SIPP hot-deck imputation procedures, longitudinal refers to previous-wave data.


M
merged households

Households created either when two separate sampling units, each containing original sample
members, are merged together, perhaps because of a marriage, or when a household splits into
two new households and later the households recombine.

microdata files

Data files containing information at the person, family, or household level. For SIPP, they
include the core wave files, topical module files, and full panel files.

missing item data

Data that are missing for one or more individual questions or variables, but the observation has
sufficient reported information to be classified as interviewed.

missing waves

Waves in which a respondent has no data, although data are present for other waves.

monthly interview status variables

Variables that indicate whether a person was in sample in a particular month, and whether a
person was in sample in the interview month. They are known as the PP-MIS variables.

mover

An original sample person who moves during the life of the panel.


                                              E-6
                                                                                   GLOSSARY


N
National Longitudinal Survey (NLS)

Collects data on current labor force and employment status, work history, and characteristics of
the current or last job.

non-self-representing (NSR) primary sampling units (PSUs)

Smaller PSUs that must be grouped with similar PSUs from the same region in order to form
strata for sampling. This level of geography is not available on the public use files.


O
original sample members

All people who were interviewed in the first wave of the panel and any children subsequently
born to or adopted by them.

oversampling

Sampling that involves selecting certain groups or units with higher probabilities than others,
resulting in the oversampled group having greater representation than occurs in the population
from which it was drawn.


P
P-70 reports

Primary source for published estimates from the SIPP. These reports can be obtained from the
SIPP Web site or from the Census Bureau.

panel

Refers both to a new sample that is introduced periodically in the SIPP and to the full collection
of information for that sample. For example, the 1996 Panel refers to both the sample introduced
in 1996 and the 12 waves of interviews conducted with that sample.


                                               E-7
SIPP USERS’ GUIDE


panel nonrespondents

Persons for whom an interview is missing for a wave.

Panel Study of Income Dynamics (PSID)

A nationally representative, longitudinal survey of the U.S. population, conducted by the
University of Michigan. The focus of the survey is economics and demographics, especially
income sources and amounts, employment, family composition changes, and residential location.

Partial panel files

Longitudinal files to be released by the Census Bureau prior to the conclusion of the 1996 Panel
because of the 4-year duration of the 1996 Panel.

person-level noninterviews

This type of noninterview occurs when data are collected for at least one member of a household,
but are missing for one or more other sample persons within that household.

person-month files

Microdata files containing a record for each person in a wave, for each month of the reference
period the person was in the sample.

person nonresponse

Nonresponse that occurs when at least one person in the household is interviewed, while at least
one other person is not. See Type Z noninterview.

primary family

Family containing the household reference person and related individuals.

primary individual

A household reference person who lives alone or lives with only nonrelatives.

primary sample members

See original sample members.

primary sampling units (PSUs)

Geographic units based on Census data and used in developing the SIPP sample. This level of
geography is not available on the public use files.


                                              E-8
                                                                                  GLOSSARY


program units

The group of individuals which constitutes one case, as defined by a particular benefit program.
In SIPP, program units apply to health insurance and transfer programs and are identified for
programs in which a case can consist of more than one person.

proxy interviews

Interviews taken on behalf of a sample member who is unable to answer.

public use microdata files

Data files that have been prepared by the Census Bureau for public use. These files have already
been processed to impute missing data, to edit data for confidentiality, and to provide weights.
Microdata files are available from the Census Bureau or on-line from the SIPP Web site.


Q

R
random carryover method

Longitudinal imputation procedure used to impute missing wave data.

1996 Redesign

A revamping of SIPP in order to improve the quality of estimates and to make the data more
useful to analysts.

reference months

The months that constitute the reference period for a wave. The months vary for different
rotation groups.

reference period

The 4 calendar months preceding the month of interview. The reference period is a different
calendar period for each rotation group.


                                              E-9
SIPP USERS’ GUIDE


reference person

An owner or renter of record who can reasonably be expected to answer questions about the
household in general and about other household members should they be unavailable for
interview. All people in the household are listed according to their relationship to the reference
person.

related subfamily

A married couple and dependents or parent-child family related to the reference person but not
including him or her. An example would be the reference person’s daughter and son-in-law.

rotation group

A subsample containing roughly one-quarter of the sample members. One rotation group is
interviewed each month of a 4-month wave.


S
sample attrition

Loss of sample members. Sample attrition rates decline over time, but total attrition numbers
increase.

seam effect

The tendency of respondents to report a disproportionate number of changes as occurring at the
“seam” between the fourth month of one wave and the first month of the following wave.

secondary families

Two or more people living in the same household who are related to each other but not to the
household reference person.

secondary individual

An individual who is neither a household reference person nor a relative of any other people in
the household.

secondary sample members

People living with original sample members.


                                              E-10
                                                                                  GLOSSARY


self-representing (SR) primary sampling units (PSUs)

Larger PSUs that do not have to be combined with other PSUs in order to form strata for
sampling. This level of geography is not available on the public use files.

sequential hot-deck procedure

See hot-deck procedure.

short waves

Waves that contain three rotation groups instead of the standard four.

skip patterns

Mechanisms embedded in the survey that allow the interviewer to skip over irrelevant questions
and call up the next relevant question.

source and accuracy statement

A statement included with the technical documentation that accompanies public use files; it
contains detailed information about weights on the files, when and how to make adjustments to
the weights, and how to use generalized variance procedures to compute standard errors for some
common types of estimates. It also includes cautions for users about sources of nonsampling
error.

Survey of Program Dynamics (SPD)

An offshoot of SIPP that began recontacting members of the 1992 and 1993 Panels, with data
collection to continue through 2001 in order to collect 10 years of data.

Surveys-on-Call

An on-line data access tool available on the SIPP Web site. Surveys-on-Call allows users to
define microdata extracts from SIPP public use files through the 1993 Panel.


T
technical documentation

Information that accompanies microdata files and that includes a description of file contents, a
glossary, codes, a data dictionary, a source and accuracy statement, and a copy of the core
questions for the panel in question.


                                              E-11
SIPP USERS’ GUIDE


time-in-sample effect

Tendency of sample members to “learn” the survey over time, possibly resulting in altered
responses.

topcoding

Practice of recoding income variables to protect against the possibility that a user might
recognize the identity of a SIPP respondent with very high income. Incomes exceeding a
maximum value are recoded to that maximum value or to a mean of responses in excess of that
value.

topical content

Questions that are not repeated in every wave. They cover a wide range of topics and can occur
once or more than once in a panel. The questions are grouped into modules by topic.

topical module files

Files containing all topical module data from the wave in question.

topical modules

Collections of questions asked periodically, but not at every interview, about various topics that
might be outside the range of the core content.

topical module imputation procedure

Missing data in topical modules are imputed using the same hot-deck procedure used to impute
missing data in the core questionnaire.

Type A noninterview

Households that are occupied by people eligible for interview but for which no interview is
obtained.

Type B noninterview

A household noninterview that occurs when the address unit is vacant or in some way unfit for
residence.


                                              E-12
                                                                                    GLOSSARY


Type C noninterview

In Wave 1, a household noninterview that occurs when the housing unit has been demolished or
converted to some other use; in subsequent waves, a household noninterview that occurs when
all sample members in a household are outside the scope of the survey, for example, deceased,
living abroad, living in institutions, or living in armed forces barracks.

Type D noninterview

Households or people who have moved to an unknown address, or who have moved more than
100 miles from the nearest field representative and for whom no telephone interview is
conducted. This type of noninterview applies only to Wave 2 and beyond.

Type Z imputation

Procedures used to impute missing data for Type Z noninterviews and for situations when a
person was in sample early in the wave but not in sample by the month of interview.

Type Z noninterview

An eligible person in an interviewed household from whom the field representative could not get
an interview or for whom the interviewer could not obtain a proxy interview. A noninterview
also occurs when a person who was part of the household for a portion of the reference period
moves and is no longer a household member on the date of the interview. If the person is an
original sample member, an effort will be made to locate and follow the person.


U
undercoverage

Underrepresentation of demographic subgroups within the surveyed population.

unrelated subfamily

A family, that is, a group of two or more related individuals, living at a sample address unit that
does not contain the reference person or anyone related to the reference person.

User Notes

Issued periodically by the Census Bureau, these contain updated information for specific
microdata files.


                                              E-13
SIPP USERS’ GUIDE


usual place of residence

Place where a person normally lives and sleeps; specific living quarters held for the person, to
which he or she is free to return at any time.


V
variable metadata

Provides a complete characterization of a variable’s content. Variable metadata are available on
the SIPP Web site.


W
wave

One round of interviewing, which takes 4 months to complete; one fourth of the sample (i.e., a
rotation group) is interviewed each month.

wave files

See core wave files.

weights

Estimates of the number of units in the target population that a given unit represents.


X

Y

Z


                                               E-14
References
Allen, T. M., Petroni, R. J., and Singh, R. P. (1993). The effectiveness of oversampling low-
      income households in the Survey of Income and Program Participation, U.S. Bureau of the
      Census, Washington, DC. Proceedings of the American Statistical Association.
      Alexandria, VA: American Statistical Association.
Brick, J. M., and Kalton, G. (1996). Handling missing data in survey research. Statistical
      Methods in Medical Research 5, 215–238.
Bye, B., and Gallicchio, S. (1989). Two Notes on Sampling Variance Estimates from the 1984
     SIPP Public-Use Files. SIPP Working Paper No. 8902. Washington, DC: U.S. Bureau of
     the Census.
Citro, C. F., Hernandez, D., and Herriot, R. (1986). Longitudinal household concepts in SIPP:
      Preliminary results. Proceedings of the Bureau of the Census Second Annual Research
      Conference, Washington, DC: U.S. Department of Commerce, pp. 598-619. (Also
      available as SIPP Working Paper No. 8611, Washington, DC: U.S. Bureau of the Census.)
Citro, C. F., and Kalton, G. (1993). The Future of the Survey of Income and Program
      Participation. Washington, DC: National Academy Press.
Citro, C. F., Michael, R. T., and Maritano, N. (eds.) (1995). Measuring Poverty:      A New
      Approach. Washington, DC: National Academy Press, Appendix B.
Coder, J., and Scoon-Rogers, L. S. (1996). Evaluating the Quality of Income Data Collection in
     the Annual Supplement to the March Current Population Survey and the Survey of Income
     and Program Participation. SIPP Working Paper No. 9604. Washington, DC: U.S. Census
     Bureau.
Doyle, P., and Dalrymple, R. (1987). The impact of imputation procedures on distribution
     characteristics of the low income population. Proceedings of the Bureau of the Census
     Third Annual Research Conference. Washington, DC: U.S. Department of Commerce, pp.
     483–508. (Also available as SIPP Working Paper No. 8710, Washington, DC: U.S. Census
     Bureau)
Duncan, G., and Hill, M. (1985). Conceptions of longitudinal households: Fertile or futile?
     Journal of Economic and Social Measurement 13, 361–376.
Eargle, J. (1990). Household Wealth and Asset Ownership: 1988. Current Population Reports
      P70-22. Washington, DC: U.S. Census Bureau.
Guo, G. (1993). Event-history analysis for left-truncated data. Sociological Methodology 23,
     217–243.


                                             R-1
SIPP USERS’ GUIDE


Huggins, V. J., and King, K. E. (1997). Evaluation of oversampling the low-income population
     in the 1996 Survey of Income and Program Participation (SIPP), U.S. Bureau of the
     Census, Washington, DC. Proceedings of the American Statistical Association, Survey
     Research Methods Section. Anaheim, CA: American Statistical Association.
Jabine, T., King, K., and Petroni, R. (1990). SIPP Quality Profile, 2nd Ed. Washington, DC:
      U.S. Census Bureau.
Jinn, J. H., and Sedransk, J. (1987). Effect on secondary data analysis of different imputation
      methods. Proceedings of the Bureau of the Census Third Annual Research Conference.
      Washington, DC: U.S. Department of Commerce, pp. 509–530.
Kalbfleisch, J. D., and Prentice, R. L. (1980). The Analysis of Failure Time Data. New York:
     John Wiley & Sons.
Kalton, G., and Brick, J. M. (1995). Survey Methodology, 21, 33-44.
Kalton, G., and Kasprzyk, D. (1986). The treatment of missing survey data. Survey Methodology
     12(1), 1–16.
Kalton, G., Lepkowski, J., Heeringa, S., Lin, T., and Miller, M. E. (1987). The Treatment of
     Person-Wave Nonresponse in Longitudinal Surveys. SIPP Working Paper No. 8704.
     Washington, DC: U.S. Census Bureau.
Kalton, G., Miller, D. P., and Lepkowski, J. (1992). Analyzing Spells of Program Participation in
     the SIPP. SIPP Working Paper No. 9210 (171). Washington, DC: U.S. Census Bureau.
Kalton, G., Winglee, M., and Jabine, T. (1998). SIPP Quality Profile, 3rd Ed. Washington, DC:
     U.S. Census Bureau.
King, K., Petroni, R., and Singh, R.P. (1987). SIPP Quality Profile. Washington, DC: U.S.
     Census Bureau.
Lepkowski, J., and Bowles, J. (1996). Sampling error software for personal computers. Survey
     Statistician 35, 10–17.
Lepkowski, J. M., Landis, R. L., and Stehouwer, S. A. (1987). Strategies for the analysis of
     imputed data from a sample survey. Medical Care 25(8), 705–716.
Little, R. J. A. (1986). Missing data in Census Bureau surveys. Proceedings of the Bureau of the
       Census Second Annual Research Conference. Washington, DC: U.S. Department of
       Commerce, pp. 442–454.
Little, R. J. A., and Rubin, D. B. (1987). Statistical Analysis with Missing Data. New York:
       John Wiley & Sons, pp.129–139.
Marquis, K. H., and Moore, J. C. (1989a). Response errors in SIPP: Preliminary results.
    Proceedings of the Bureau of the Census Fifth Annual Research Conference. Washington,
    DC: U.S. Department of Commerce, pp. 515–536.


                                              R-2
                                                                               REFERENCES


Marquis, K. H., and Moore, J. C. (1989b). Some response errors in SIPP—with thoughts about
    their effects and remedies. Proceedings of the, American Statistical Association, Survey
    Research Methods Section. Anaheim, CA: American Statistical Association, pp. 381–386.
Marquis, K. H., and Moore, J. C. (1990). Measurement errors in SIPP program reports.
    Proceedings of the U.S. Bureau of the Census’ 1990 Annual Research Conference.
    Washington, DC: U.S. Department of Commerce, pp. 721–745.
Marquis, K. H., Moore, J. C., and Huggins, V. J. (1990). Implications of SIPP Record Check
    results for measurement principles and practice. Proceedings of the American Statistical
    Association, Survey Research Methods Section. Anaheim, CA: American Statistical
    Association, pp. 564–569.
McCormick, M. K., Butler, D. M., and Singh, R. P. (1992). Investigating time in sample effect
    for the Survey of Income and Program Participation. Paper prepared for the American
    Statistical Association Annual Meeting. Washington, DC: U.S. Census Bureau.
McMillen, D., and Herriot, R. (1985). Toward a longitudinal definition of households. Journal of
    Economic and Social Measurement 13, 504–509. (Also available as SIPP Working Paper
    No. 8402. Washington, DC: U.S. Census Bureau.)
McNeil, J. (1988). CPS and SIPP Estimates of Health Insurance Coverage Status. Census Bureau
    Internal Memorandum, May 3.
Moore, J.C. (1988). Self/proxy Response Status and Survey Response Quality—A Review of the
    Literature. Journal of Official Statistics 4, 155–172.
Pennell, S. G. (1993). Cross-Sectional Imputation and Longitudinal Editing Procedures in the
     Survey of Income and Program Participation. Prepared by the University of Michigan
     Survey Research Center, Ann Arbor. Washington, DC: U.S. Census Bureau.
Pennell, S. G., and Lepkowski, J. M. (1992). Panel Conditioning Effects in the Survey of Income
     and Program Participation. Proceedings of the American Statistical Association, Survey
     Research Methods Section. Alexandria, VA: American Statistical Association, pp. 566–
     571.
Ruggles, P., and Williams, R. (1989). Measuring the Duration of Poverty Spells. SIPP Working
     Paper No. 8909. Washington, DC: U.S. Census Bureau.
Rust, K. (1985). Variance estimation for complex estimators in sample surveys. Journal of
      Official Statistics 1, 381–397.
Sedransk, J. (1985) The objectives and practice of imputation. Proceedings of the Bureau of the
     Census First Annual Research Conference. Washington, DC: U.S. Census Bureau, pp.
     445–452.
Shapiro, G. M., Diffendal, G., and Cantor, D. (1993). Survey Undercoverage: Major Causes and
     New Estimates of Magnitude. Census Bureau Internal Memorandum.


                                              R-3
SIPP USERS’ GUIDE


Shea, M. (1995a). Dynamics of Economic Well-Being: Poverty 1990–1992. Current Population
      Reports P70-112. Washington, DC: U.S. Census Bureau.
Shea, M. (1995b). Dynamics of Economic Well-Being: Program Participation, 1990 to 1992
      Current Population Reports P70-41. Washington, DC: U.S. Census Bureau.
Skinner, C. J., Holt, D., and Smith, T. M. F. (1989). Analysis of Complex Surveys. New York:
     John Wiley & Sons.
Tuma, N. B., and Hannan, M. T. (1984). Social Dynamics, Models and Methods. Orlando, FL:
    Academic Press.
U.S. Census Bureau (1991). Survey of Income and Program Participation Users’ Guide, 2nd Ed.
      Washington, DC: U.S. Census Bureau.
U.S. Census Bureau (1993). Survey of Income and Program Participation Initial Training Guide.
      Washington, DC: U.S. Census Bureau.
U.S. Census Bureau (1994). SIPP Information Booklet: 1990 and 1991 Panels. Form SIPP-
     7004A. Washington, DC: U.S. Census Bureau.
U.S. Census Bureau (1998a). Survey of Income and Program Participation Quality Profile, 3rd
     Ed. Washington, DC: U.S. Census Bureau.
U.S. Census Bureau (1998b). The Current Population Survey:         Design and Methodology.
     Technical Paper 63. Washington, DC: U.S. Census Bureau.
Waite, P.J. (1996). SIPP (1996) Specifications for Interview Mode Flag. Internal Census Bureau
     Memorandum to Chester Bowie, May 17th.
Williams, T., and Bailey, L. (1996). Compensating for Missing Wave Data in the SIPP. SIPP
      Working Paper No. 9605. Washington, DC: U.S. Census Bureau.


                                             R-4
Index
Accessing SIPP information. See also                             history, 3-15
  Information resources                                          ID variables, 9-7, 9-14, 10-27, 10-28, 10-29,
   published estimates, 1-5–1-6, 5-1, 5-2–5-3                       10-30–10-31, 12-29, 12-30, 12-31
                                                                 misinterpretation of questions on, 6-3
Activities of Daily Living (ADL)                                 replacement with TANF, 1-3, 9-7, 10-27
  instrument, 3-10, 3-11                                         weights, 8-2
Additional household members. See also                       Algorithms
  Household composition                                        calendar-year and panel weight generation, C-17
   births, 2-14, 8-5, 8-7, 8-17, 9-5, 9-8, 10-25, 13-16,       family identification variables, 12-17, 12-18
      13-17                                                    monthly program income variables, 12-30, 12-36
   defined, C-13                                               reference months aligned to calendar months,
   following rules, 1-4, 2-1, 2-9, C-13                           12-9, 12-10
   identification, 9-3, 10-8, 10-25, 11-13, 11-14,             second-stage calibration, C-4–C-12, C-16, C-23
      12-14, 12-24–12-25                                       topcoding, 10-33–10-34
   imputation of records, 4-6–4-7, 10-36                     Alimony payments, 3-3, 3-6
   interview procedures, 2-16, 2-17                          Allocation flags, 4-11, 4-13–4-14, 4-15, 10-36–
   movers, 4-6–4-7, 8-6, 10-8, 10-20, 11-24, 12-24–            10-37, 11-28, 12-37, 13-8, 13-22
      12-25
   weighting adjustment, 8-5, 8-7, 8-17, 9-5, 9-8
                                                             American Statistical Association (ASA),
                                                                 1-14, 5-15
Address. See also Current Address IDs;
                                                             Area enumeration districts frame. See Area
  Entry Address IDs
  clusters, 2-6, 8-4, 8-5, 10-8, 11-13, C-2
                                                               frame
  enumeration districts frame. See Unit frame                Area frame, 2-5–2-6
  screening, 2-6                                             Asset ownership
  subsampling, 2-6                                             comparison of surveys, 1-9, 1-10
  units, 2-6, 2-10, 2-18, 12-14, E-1                           core questions, 3-3–3-4, 3-5, 3-6, 3-8
Adjustment cells, 4-8–4-9, 4-12                                errors in estimates, 6-4, 13-12
Administrative records, responses compared                     household, C-15
                                                               imputation, 4-4, 4-7, 4-9
  to, 6-3–6-4                                                  income, 3-3–3-4, 3-5, 3-6, 3-13, 10-29, 10-32
Age                                                            information resources, 5-2, 5-3, 5-16, 13-12
  core wave file structure, 13-7                               joint, 3-4, 3-8
  following rules, 2-9, 2-12, 10-25, 11-24, 12-26,             municipal/corporate bonds, 10-29
      13-15                                                    nonresponse, 6-2, C-18
  imputation, 10-37                                            topcoding, 11-28, B-6–B-7
  job or business started, B-5                                 topical modules, 3-6, 3-8, 3-13, 3-14
  population status based on, 11-12                          Associated sample persons, C-13, C-14
  at receipt of Social Security Disability benefits,
      B-5
                                                             Attrition
  respondents, 1-2, 2-7, 2-16, 3-1, 3-6, 3-7, 3-9,               bias, 1-6, 1-7, 2-2, 6-3
      3-10, 11-6, 11-10                                          confounding with time-in-sample bias, 6-3
  topcoding, 4-17, B-4–B-5                                       defined, E-9
  variable name, 11-11, 11-12                                    and merging files or data, 13-16, 13-17, 13-20–
  weighting, 8-5, C-3–C-4, C-6–C-8                                  13-21
                                                                 by panel, 2-19
Aging population, 5-16                                           spell construction, 8-19
Aid to Families with Dependent Children                          total sample, 2-17–2-18
  (AFDC)                                                         weighting adjustments, 8-4, 8-19, 13-22
   authorized recipient, 10-28, 12-30, 12-31
   coverage, 12-30, 12-31


                                                       Index-1
SIPP USERS’ GUIDE

Balanced repeated replications, 7-2, 7-3                        calendar year estimates, 8-18, C-17–C-25
Basic needs information, 3-8, 3-10, 5-3                     Callbacks, 2-17, 2-21
Benefits                                                    Census Region, 8-5
  electronic transfer of, 3-15                              Censuses of the Population
  employer-provided, 3-4, 3-8, 3-9–3-10                         Decennial, 2-6, 2-8
  offered solely to children, 10-27, 10-28, 12-29           CHAMPUS, 9-14, 10-27, 12-29
  topical modules, 3-8                                      CHAMPVA, 9-14
Bias                                                        Child care
  attrition, 1-6, 1-7, 2-2, 6-3, 13-20–13-21                    foster care, 9-14, 10-27, 12-29
  in imputation of missing data, 13-20–13-21                    ID variables, 9-14, 10-27
  linking families or households, 13-1–13-2                     information resources, 5-2, 5-3, 5-16
  multivariate statistics, 13-20–13-21                          topical modules, 3-7, 3-8–3-9
  nonmetropolitan samples, 10-39
  nonresponse, 2-17, 4-2, 6-1
                                                            Child support
  sampling error estimation, 1-7, 2-5                           agreements, 3-9
  selection, 13-21                                              income, 3-3
  standard error estimates, 2-5, 13-21                          paid, 1-10, 3-9, 3-15, 12-37
  systematic, 6-3                                               pass-through payments, 3-5, 3-9
  time-in-sample, 1-7, 2-2, 6-3, 8-19                           topcoded payments, 12-37
  undercoverage of subpopulations, C-17                         topical modules, 3-7, 3-9, 3-15
  unweighted analyses, 8-1, 8-2, 9-8                        Children. See also Births; Infants
Bibliography, online, 1-13, 5-15                              benefits offered solely to, 10-27, 10-28, 12-29
                                                              core wave file records, 10-6
Birth year, bottomcoding, B-4, B-7                            custodial arrangements, 3-9, 3-14
Births                                                        disability, 10-28, 10-29, 10-30–10-31, 12-30
  errors in estimates, 6-4                                    following rules, 1-4, 2-9
  ID variables, 10-25, 11-24, 12-26                           foster, 9-14, 10-16, 10-17, 10-27, 11-20
  order of, 3-10                                              health status, 3-11
  to original sample members, 2-14, 10-25, 11-24,             imputation of program participation, 10-28, 12-28
     13-16, 13-17                                             income, 3-6
  to single mothers, 8-19                                     interview procedures, 2-17, 3-1
  weighting adjustments, 8-5, 8-7, 8-17, 9-5, 9-8             living arrangements, 5-2
Boarding houses, 2-6, 10-17, 12-15                            moves without parents, C-19
Bottomcoding, 4-17, B-4                                       of original sample members, 10-6
Building permits, 2-6                                         P-70 publications, 5-2
                                                              parents linked to, 10-7, 11-13, 11-16, 12-13
Bureau of Labor Statistics (BLS), 1-9, 5-13                   paternity establishment status, 3-9
Business. See also Employers;                                 program units, coverage, and recipiency, 10-29,
  Self-employment                                                 10-30–10-31, 12-29
  characteristics, 4-14                                       relationship to reference person, 10-16, 10-17,
  ownership, 3-3, 3-8                                             10-18, 11-20
                                                              special education services, 3-11
Calendar month                                                topical modules, 3-9, 3-10–3-11
  alignment of data by, 8-19, 12-7, 12-9, 12-10,              weighting adjustments, 8-17, C-4, C-7, C-10,
     12-11–12-12, 13-4                                            C-19, C-24–C-25
  estimates, 8-12, 8-14–8-16, 8-19, 9-8, 9-9, 10-7            well-being, 3-7, 3-9, 5-16, 11-21
  format, 10-7                                              Clustering of addresses, 2-6, 8-4, 8-5, 10-8,
  interview month correspondence, 13-13                       11-13, 12-14, C-2
  topcodes, 10-36, 12-37                                    Cold-deck values, 4-8, 4-11–4-12, E-1
  weights, 8-12, 8-14–8-15, 8-19, 9-8, 12-7, 13-1,          College students, 2-16
     13-8                                                   Computer-assisted interviewing (CAI)
Calendar year                                                   advantages over paper instrument, 3-1, 4-15, 8-6
  estimates, 8-18, 9-8, 11-21                                   case management features, 3-1, 3-2, 3-3, 13-13
  weights, 8-3, 8-7–8-8, 8-16–8-17, 8-18, 9-5, 9-8,             data editing, 1-3, 1-5, 2-17, 4-6, 4-15
     12-37–12-38, 13-21, C-17–C-25


                                                      Index-2
                                                                                                          INDEX

   defined, E-1                                                  edits, 4-4, 4-15, 8-16, 10-37, 12-37, 13-6–13-7,
   mode of interviewing, 6-2                                        13-14
   quality of data, 1-3, 3-1, 6-2, 8-16                          family characteristics, 9-12
   questionnaire documentation, 5-14, 11-2, 12-2                 family composition variables, 9-13, 9-15, 10-15–
   skip patterns, 2-17, 3-2, 10-2, 10-6, 11-2                       10-20
   variable name changes, 10-6                                   family identification, 9-6, 9-12, 10-11–10-14,
Computer-assisted personal interviewing                             10-21, 12-17
  (CAPI), 6-2, E-1                                               full panel files compared, 9-11–9-15, 10-37, 12-6,
                                                                    12-10, 12-17, 12-30, 12-37, 13-1, 13-14
Confidentiality. See also Topcoding                              household composition variables, 9-11, 9-12,
   bottomcoding, 4-17                                               9-13, 9-15, 10-8, 10-15–10-20, 10-23–10-24,
   core wave files, 10-38–10-39                                     11-19
   employment information, 4-17                                  household identification, 9-11, 10-9–10-11
   geographic information, 4-17, 5-1, 10-8, 10-38–               ID variables, 9-3, 9-12, 10-6–10-14, 10-20–10-28,
      10-39, 11-13, 12-14                                           10-29–10-30, 11-11–11-12, 11-13, 11-23, 13-9,
   procedures for public use files, 1-5, 4-4, 4-5,                  13-23
      4-17–4-18, 7-2, 10-6, 10-8, 11-13, 12-14                   imputation procedures, 4-2, 4-4, 4-6–4-7, 4-13,
   telephone interviews, 2-17                                       8-16, 9-15, 10-6, 10-25, 10-36–10-37, 11-9,
Consolidated Metropolitan Statistical Areas                         12-10, 12-17, 12-37, 13-6–13-7, 13-14
  (CMSAs), 10-39                                                 income variables, 9-12, 10-19–10-20, 10-21,
Control cards, 3-2, 4-6, 8-6, E-2                                   10-27, 10-37
Control date, 8-7, 8-16                                          linking between two or more, 4-5, 5-4, 13-4, 13-6–
                                                                    13-8
Control file, 4-15                                               linking with full panel files, 1-9, 12-28, 13-8–
Core content                                                        13-11
  asset ownership, 3-3–3-4, 3-5, 3-6, 3-8                        linking with topical module files, 1-9, 13-12–
  defined, 3-1, E-2                                                 13-14
  earnings, 3-3, 3-4, 3-5                                        longitudinal analysis of data from, 13-6–13-7,
  income amounts, 1-8, 3-6                                          13-8
  labor force status, 3-3, 3-4                                   merging data within, 1-9, 12-13, 13-3–13-4, 13-5–
  1996 and subsequent panels, 3-3–3-4                               13-6
  overview, 3-2                                                  merging with full panel files, 10-6, 12-1, 12-6,
  pre-1996 panels, 3-2, 3-4–3-6                                     12-17, 12-20, 12-28, 12-30, 13-1, 13-3, 13-4
  program participation, 1-8, 3-3, 3-4, 3-5, 3-6                 merging with topical module files, 1-8, 3-10, 9-6,
  topics, 1-4, 3-3–3-6                                              9-9, 10-6, 11-1, 11-7, 11-8, 11-10, 11-11,
  unearned income, 3-3–3-4                                          11-13, 11-17, 11-19, 12-6, 12-13, 13-1, 13-3,
Core data, 2-3, 4-5, 9-7, 9-9, 11-8                                 13-4, 13-12, 13-13, 13-14, 13-15
Core items                                                       merging two or more, 10-1, 10-6, 12-13
  coverage, 1-4                                                  metropolitan area identification, 9-15, 10-38–
  defined, 3-1                                                      10-39
  full panel files, 1-8, 12-6, 13-1                              monthly interview status variable, 9-4, 9-5, 9-11,
  imputation, 4-6–4-7, 4-13, 11-9                                   11-9, 11-12
  topical module files, 1-8, 11-10                               mover identification, 10-8, 10-20, 10-22–10-26,
Core questionnaire, 2-3, 3-1, 3-2–3-6                               11-23, 13-23
                                                                 overview, 1-8
Core wave files                                                  person identification, 9-11, 9-15, 10-6–10-9,
   allocation flags, 4-13–4-14, 10-36–10-37                         11-11, 13-9, 13-23
   calendar month estimation, 8-12, 8-14, 8-19, 9-8,             person-month format, 1-8, 5-4, 5-5, 8-8, 9-1, 9-3,
      10-7                                                          9-5, 9-6, 9-11, 10-6, 10-7, 10-25, 11-7, 13-2,
   confidentiality procedures, 10-38–10-39                          13-3–13-4, 13-5–13-6, 13-7, 13-9, 13-13, 13-15
   content, 1-8, 5-4                                             person nonresponse in, 4-2, 13-22
   creation, 4-3, 4-4                                            person-record format, 9-4, 9-5, 9-7, 9-11, 10-6,
   cross-wave consistency, 4-15                                     10-7, 13-3–13-4, 13-5–13-6
   data dictionary, 9-11, 10-2–10-4, 10-5, 10-35,                previous wave variables, 11-27, 13-23
      12-3, 13-18, 13-19                                         program unit identification, 9-14, 10-26–10-29,
   defined, E-2                                                     10-30–10-31


                                                       Index-3
SIPP USERS’ GUIDE

  public use version, 4-4, 9-1–9-2, 9-3, 10-1–10-39          movers, 10-20, 10-22, 10-23–10-24, 10-25, 11-22,
  quarterly estimates, 8-14–8-16                                11-23, 11-24, 12-23, 12-24–12-25, 12-26,
  questionnaire correspondence to variables on,                 12-27
     10-4–10-6                                               newborns, 10-25, 12-26
  reference period, 9-2, 10-7, 11-8, 13-4, 13-7              split households, 9-3, 11-22, 12-28
  reformatting, 13-3–13-4, 13-5–13-6                         topical module files, 9-3, 11-7, 11-10, 11-11,
  sort order, 13-3, 13-4, 13-6                                  11-14, 11-15, 11-16, 11-17, 11-18, 11-22,
  state variable, 9-15, 10-38                                   11-26
  structure, 5-4, 5-5, 8-8, 9-1–9-2, 9-11, 10-6, 10-7,       transfer program unit composition, 9-8
     11-7, 12-6, 13-6–13-7                                   variable names, 9-3, 10-10, 11-11, 12-15
  technical documentation, 10-2–10-4                       Current Population Reports, 1-13
  topcoding, 9-15, 10-6, 10-29, 10-32–10-36, 11-28         Current Population Survey (CPS), 1-1, 1-9,
  topical module files compared, 9-11–9-15, 11-7,            1-10, 6-4, C-3–C-4, C-8, C-9, C-16, C-20, C-24,
     11-8, 11-11–11-12, 13-13                                C-25, E-2
  uses, 5-4
  variable names, 9-1, 9-13, 10-1, 10-4, 10-8, 10-11,
     11-11–11-12, A-1–A-34                                 Data Access and Dissemination System
  variance estimation variables, 7-3                         (DADS), 5-12
  weighting procedures, 5-4, 8-8–8-16, 10-37               Data collection procedures, 5-16, 6-2
  weights, 5-4, 8-3, 8-4–8-5, 8-7, 8-8–8-13, 9-8,          Data dictionary
     9-15, 10-1, 10-2, 13-8, 13-22, C-1–C-25                   accuracy of definitions, 11-6, 12-3
  wide-record format, 13-7                                     contents, 4-13, 5-14, 10-2, 11-2, 12-2–12-3
Coverage                                                       core wave files, 9-11, 10-2–10-4, 10-5, 10-35,
  core items, 1-4                                                 12-3, 13-18, 13-19
  CPS, 1-9                                                     corrections to, 5-14
  housing units, 2-6                                           defined, E-2
  improvement frame, 2-6                                       differences by file types, 9-11, 12-3
  ratio, 1-6, 6-1                                              excerpts from, 10-3–10-4, 11-3–11-4, 12-4, 13-18,
  transfer program unit, 4-16, 9-14, 10-26–10-28,                 13-19
      10-29, 10-30–10-31, 12-28, 12-30–12-31                   exiting sample member variables, 13-18–13-19
Cross-sectional analyses                                       format, 10-2–10-4, 11-3–11-4, 12-3–12-5
  core wave files, 5-4                                         full panel files, 9-11, 12-2–12-5, 12-31, 13-19
  defined, E-2                                                 machine-readable version, 10-2, 11-2, 12-3
  editing and imputation, 4-1, 4-8, 4-9                        questionnaire correspondence to, 10-4–10-6, 11-6,
  full panel files, 12-7                                          12-5–12-6
  quarterly estimates, 8-16                                    SAS and FORTRAN syntax, 10-4, 10-5, 11-4,
  sample size and, 2-2                                            11-5, 12-3, 12-5
  seam effect and, 6-3                                         topcodes, 10-35, 12-31
  weights, 8-3, 8-4, 8-16, C-12–C-13                           topical module files, 9-11, 11-2–11-5, 11-6, 12-3
Cross-walks                                                    universe definitions, 10-3, 10-6, 11-4, 11-6, 12-3
  reference periods, 10-2, 11-2, 12-2                          variable metadata, 5-15
  variables names for core wave files, A-1–A-34                variable name–content correspondence, 10-6
Current Address IDs                                        Data editing
  components, 9-3–9-4, 10-20, 11-22                            advantages over imputation, 4-3
  core wave files, 9-3, 10-7, 10-10, 10-13–10-14,              allocation flags, 4-13, 10-37
     10-20, 10-22, 10-23–10-24, 11-11, 11-23                   CAI, 1-3, 1-5, 2-17, 4-6, 4-15
  family identification, 10-11, 10-13–10-14, 10-21,            confidentiality-related, 4-17
     11-17, 11-18, 12-18, 12-20                                core wave files, 4-4, 4-15, 8-16, 10-37, 12-37,
  family-level income, 12-23                                      13-1, 13-6–13-7
  by file type, 9-3                                            cross-sectional, 4-1
  full panel files, 12-15, 12-16, 12-18, 12-20, 12-23,         defined, E-2
     12-24–12-25, 12-26, 12-27                                 effect on analyses, 4-15, 8-16, 13-1, 13-6–13-7,
  household composition, 9-6, 10-10, 10-23–10-24,                 13-8, 13-12
     11-14, 11-16, 11-25–11-26, 12-15, 12-16,                  full panel files, 1-5, 4-3, 4-5, 4-14, 4-15–4-16,
     12-27                                                        12-7, 12-37, 13-1, 13-8


                                                     Index-4
                                                                                                           INDEX

  geographic information, 4-17–4-18                           Education and training
  for internal consistency, 4-4, 10-37                          financial assistance, 3-4, 3-5, 3-14, 5-2
  item nonresponse from, 2-21                                   history, 3-4, 3-9, 3-14, 11-12, 11-28
  longitudinal, 1-5, 4-1, 4-4, 4-5, 4-14, 4-15–4-16             household characteristics, 8-6
  paper questionnaires, 2-17, 4-6                               information resources, 5-2, 5-16
  procedures, 4-1, 4-4, 4-8, 4-15–4-16                          noninterview adjustments, C-18
  topcoding, 1-5, 4-17                                          topical modules, 3-7, 3-9, 3-10, 3-14, 11-12
  topical modules, 4-4, 13-12                                 Eligibility, program, 3-8, 3-15, 10-38, 11-29,
  uses, 2-21, 4-1, 4-3                                          12-38
Data entry, 4-2, 4-6                                          E-M algorithm, 13-21
Data Extraction System (DES), 5-12                            Emigration, 8-5
Data processing. See also Data editing;                       Employers
  Imputation                                                      characteristics, 3-3, 10-36, 10-37
  overview, 4-3–4-5                                               health benefits provided by, 3-4, 3-8, 3-9–3-10
  phase 1, 4-3, 4-4–4-5, 4-6–4-14                                 maternity leave policies, 3-10
  phase 2, 4-3, 4-5, 4-15–4-16                                    variables, 10-5
Deaths, 8-4, 8-5, 8-7, 9-5, 9-8, 11-11, 12-13, 13-16,         Employment. See also Labor force status;
  13-17, 13-19                                                  Unemployment; Work
Department of Health, Education, and                            confidentiality procedures, 4-17
  Welfare, 1-1                                                  core questions, 3-3, 3-4
Dependent care, 3-8                                             gender differences, 5-2
Design of SIPP. See also Redesign (1996) of                     history, 3-10
                                                                home-based, 3-6, 3-16
  SIPP; Sample design                                           income, 10-32–10-36
   comparison with other surveys, 1-9–1-11                      information resources, 5-2, 5-16
   evolution, 1-1–1-2                                           job offers for unemployed respondents, 3-12
   features, 1-2–1-3                                            number in second business, 10-6
   information resources, 5-16                                  pregnancy and, 3-10
   organizing principles, 2-1–2-5                               starting dates, 4-17
   topics, 1-4–1-5, 2-1                                         topical modules, 3-7, 3-10, 3-12, 3-15–3-16
Disability                                                      variables, 10-5
  children, 3-11, 10-28, 10-29, 10-30–10-31, 12-30            Energy assistance, 3-4, 3-6
  functional limitations, 3-10–3-11, 5-2
  history, 3-15
                                                              Energy usage, 3-12
  income, 3-3, 3-5, 12-30                                     Entry Address IDs
  long-term care needs, 3-12                                      changes in, 10-26, 11-13, 11-27, 12-14
  medical expenses, 3-12                                          components, 9-4, 10-8, 11-14, 12-14
  P-70 publications, 5-2, 5-3                                     core wave files, 9-3, 10-7, 10-8, 10-9, 10-20,
  topical modules, 3-7, 3-10, 3-11                                   10-22, 10-23–10-24, 11-23, 13-3, 13-7
  work-related, 3-11, 3-12, 3-15                                  family-level income, 12-23
Divorces, 6-4                                                     full panel files, 9-3, 12-7, 12-8, 12-11–12-12,
                                                                     12-13, 12-14, 12-15, 12-16, 12-21, 12-23–
                                                                     12-27
Earnings. See also Income, earned; Wages                          household identification, 12-16
  and salaries                                                    movers, 10-8, 10-20, 10-22, 10-23–10-24, 11-14,
   annual, 3-8                                                       11-22, 11-23, 11-24, 11-25–11-26, 12-23–
   core questions, 3-3, 3-4, 3-5                                     12-27
   information resources, 5-16                                    newborns, 10-25, 12-26
   misinterpretation of questions about, 6-3                      purpose, 9-3, 9-4, 11-14
   self-employed, 10-32                                           redesign of 1996 and, 9-4, 10-7, 10-8, 10-9, 11-13,
   topcoding, 10-32–10-35, 12-37, B-1–B-4, B-7                       12-13, 13-3
   topical modules, 3-8                                           sorting files for linking, 13-3, 13-4, 13-9, 13-14,
Edits. See Data editing                                              13-15
                                                                  spouses, parents, and guardians, 12-21, 12-22


                                                        Index-5
SIPP USERS’ GUIDE

  topical module files, 9-3, 11-7, 11-10, 11-12,                 income, 9-12, 10-19–10-20, 10-21, 10-35, 10-36,
     11-13, 11-14, 11-15, 11-22, 11-24, 11-25–                      12-23, 12-37, C-18
     11-26, 11-27                                                merging files to obtain, 9-6, 11-13, 11-17, 12-17,
  values, 10-8                                                      12-20
  variable names, 9-3, 11-12                                     support networks, 5-2
  by wave, 10-9                                                  topical modules, 3-7, 3-11, 9-12
EPDJBTHN variable, 4-14                                          transfer program income recipient, 10-7, 10-27,
EPPFLAG imputation, 4-10, 4-13, 4-14, 10-36–                        10-28
  10-37                                                      Family composition
EPPINTVW field, 4-13–4-14, 10-36                               background information, 3-10
                                                               core wave files, 9-13, 9-15, 10-15–10-20
Errors. See also Nonsampling errors;                           determining, 9-6–9-7
  Sampling errors; Standard errors                             excluding related subfamily members, 10-12,
  imputation-related, 12-7, 13-7, 13-8, 13-12, 13-14              10-13–10-14, 10-15, 11-12, 11-17, 11-18,
  information sources on, 1-13                                    12-19, 12-20
  keying/recording, 4-2                                        full panel files, 9-13, 9-15, 12-19–12-22
  measurement, 6-2–6-3, 13-12                                  households, 8-12, 8-13
  in microdata files, 5-14                                     ID variables, 9-6–9-7, 9-12, 9-13, 10-11, 10-12,
  respondent recall, 2-3, 6-2                                     10-19, 11-17, 11-18, 12-18, 12-20
Evaluation studies, 6-4                                        including related subfamily, 10-19–10-20, 10-21,
Event-history analysis, 8-18, 13-20                               10-13–10-14, 11-18, 12-19, 12-20, 12-23
Expenditure data                                               interrelationships, 10-15, 10-16, 12-21, C-3–C-4,
  comparison of surveys, 1-10                                     C-6–C-8
  medical, 3-12                                                monthly, 9-6–9-7, 9-8, 12-17–12-18, 12-20
  work-related, 3-15                                           multigenerational household, 9-7, 10-12, 10-18,
                                                                  10-19, 11-21, 11-22, 12-19, 12-22
                                                               one-person, 9-6, 11-17
Family(ies). See also Subfamily                                restrictions on analyses, 12-15, 12-16
  defined, 3-11, 8-11, 9-6, 10-11, 10-12, 11-16,               topical module files, 9-6, 9-12, 9-13, 9-15, 11-16–
     11-17, 12-16, 12-17, 12-18, E-3                              11-18, 11-19–11-21, 11-22
  disruption, 5-2                                              variables, 9-13, 9-15, 10-15–10-20, 11-16–11-18,
  grouping of, 10-12                                              11-19–11-21, 11-22, 12-19–12-22
  grouping people into, 12-19                                Fathers, 10-15
  head of, 10-15
  identification, 3-11, 9-6, 9-7, 9-12, 10-11–10-14,
                                                             Fay’s method for variance estimation, 7-3
     10-21, 11-12, 11-16–11-18, 12-16–12-19,                 Federal Reserve Board, 6-4
     12-20, 12-23                                            FERRET, 1-6, 5-12, 5-13, 7-3, E-3
  information resources, 5-2, 5-16                           Fertility history, 3-10, 5-16
  methods for distinguishing, 10-12–10-14, 11-17–            Financial data, topical modules, 3-7
     11-18, 12-17–12-18
  number in household, 10-15
                                                             Following rules. See also Moves/movers
  primary, 3-11, 8-11, 8-12, 9-6, 9-12, 10-11, 10-12,            additional household members, 1-4, 2-1, 2-9
     10-19, 10-20, 10-21, 11-16, 11-17–11-18,                    age and, 2-9, 2-12, 10-25, 11-24, 12-26, 13-15
     12-16, 12-19, 12-20, 12-23, E-8                             children, 1-4, 2-9
  reference person, 3-11, 8-11–8-12, 9-6, 10-11,                 defined, E-3
     10-12, 10-15, 10-16                                         example, 2-10–2-14
  secondary, 9-6, 10-11, 11-16, 12-17, 12-19, E-9                excluded individuals, 2-9
  types, 8-11, 9-12, 10-11, 10-13–10-14, 10-15,                  original sample members, 1-4, 2-7, 2-9–2-15,
     11-16–11-17, 12-16–12-17, 12-20, 12-21, C-3                    10-25, 11-24
  weights, 8-4–8-5, 8-6, 8-8, 8-11–8-12, 8-13, 9-15,             temporarily absent members, 2-15–2-16
     C-3                                                     Food stamps
Family characteristics                                           history, 3-15
  assigning to individuals, 13-2                                 ID variables, 9-14, 10-27, 10-28, 12-29, 12-30,
  constructing, 9-8, 12-17, 12-18                                   12-31
  core wave files, 9-12                                          income, 3-3, 4-16, 10-32, 12-30, 12-34–12-36
                                                                 members of a common unit, 10-28


                                                       Index-6
                                                                                                       INDEX

  program units, coverage, and recipiency, 9-7,                 12-11–12-12, 12-13, 12-15, 12-16, 12-18,
     10-29, 10-30–10-31, 12-28, 12-29, 12-30,                   12-20, 12-23, 12-29
     12-31                                                   mover identification, 12-23–12-27, 13-23
  quarterly estimates, 8-15–8-16                             1996, 4-16, 9-3, 9-11–9-15, 13-8, 13-14
  spell estimation, 8-18                                     overview, 1-8
  user-created monthly variables, 12-30, 12-34–              person identification, 8-17, 9-11, 9-15, 12-13–
     12-36                                                      12-15, 13-23
  weights, 8-2                                               person records, 8-17, 9-2, 9-11, 9-15, 13-2
FORTRAN approach for file format change,                     pre-1996, 4-15–4-16, 7-3, 9-3, 9-11–9-15, 12-1–
  13-3                                                          12-38
FORTRAN syntax, 10-4, 10-5, 11-4, 11-5, 12-5                 program unit identification, 9-14, 12-28–12-30
                                                             public use version, 4-5, 5-12, 9-2, 9-3, 12-1–12-38
Foster children, 9-14, 10-16, 10-17, 10-27, 11-20,           quarterly estimates, 8-16
  12-29                                                      questionnaire correspondence with, 12-5–12-6
Frames, non-overlapping, 2-6                                 release of, 9-9
Full panel files                                             single files, 12-1
  allocation flags, 4-14, 4-15, 12-37                        spell estimations, 8-18–8-19
  attrition adjustments, 13-22                               state identification, 9-15, 12-38
  calendar month alignment of data, 8-19, 12-7,              structure, 5-12, 9-2, 9-11, 11-8, 12-6–12-7, 12-8,
      12-9, 12-10, 12-11–12-12                                  12-26, 12-27, 13-2
  calendar year estimates, 8-18, 9-8, 11-21                  technical documentation, 12-2–12-5, 12-9
  content, 1-8, 5-12, 12-6                                   topical module files compared, 9-11–9-15, 11-8
  core wave files compared, 9-11–9-15, 10-37, 12-6,          variable name changes, 9-3, 9-15
      12-10, 12-17, 12-30, 12-37, 13-1, 13-14                variance estimation variables, 7-3
  creation, 1-5, 4-3, 4-4, 4-5, 4-15, 5-12                   weights, 8-3, 8-7–8-8, 8-16–8-19, 9-8, 9-15, 12-1,
  data dictionary, 9-11, 12-2–12-5, 12-31, 13-19                12-2, 12-13, 12-37–12-38, 13-14, 13-22, C-1–
  data editing procedures, 1-5, 4-3, 4-5, 4-14, 4-15–           C-25
      4-16, 12-7, 12-37, 13-8, 13-14                       Functional limitations, 3-10–3-11
  defined, E-3
  family composition variables, 9-13, 9-15, 12-19–
      12-22
                                                           Gender
  family identification, 9-6, 9-7, 9-12, 12-16–12-19,        imputation, 10-37
      12-20                                                  and income topcoding, 10-32, 10-33, B-2, B-4
  format change, 5-12, 13-9–13-10                            variable name, 11-12
  household composition variables, 9-12, 9-13,               weighting adjustments, C-3–C-4, C-5, C-6–C-8
      9-15, 12-19, 12-21–12-22, 12-25, 12-26               General Assistance (GA), 9-7
  household identification, 9-11, 12-15–12-16                ID variables, 9-14, 10-27, 12-29
  ID variables, 9-3, 9-12, 9-14, 12-6, 12-23–12-28,          misinterpretation of questions on, 6-3
      13-9, 13-15, 13-23                                   General (G1) sources and amounts, 12-30,
  imputation, 1-8, 4-3, 4-5, 4-14, 8-17, 9-15, 10-37,        12-31, E-3
      12-7, 12-10, 12-17, 12-37, 13-8, 13-11, 13-14,       General income questions, 3-3
      13-22                                                Generalized variance functions (GVFs), 5-14,
  income topcoding, 5-1, 9-15, 12-31, 12-36–12-37            7-1
  income variables, 9-12, 12-23, 12-30–12-31,                accuracy of estimates from, 7-4
      12-32–12-36                                            derivation, 7-4
  linking with core wave files, 1-9, 12-28, 13-8–            standard error of a mean, 7-5–7-6
      13-11                                                  standard error of estimated number from, 7-4–7-5
  linking with topical module files, 1-9, 13-14–
      13-15
                                                           Geographic (GRIN) codes, E-3
  metropolitan area identification, 12-38                  Geographic information
  missing waves, 12-10, 13-22                                sort variables for imputation, 4-11
  merging with core wave files, 10-6, 12-1, 12-6,            state-level, 4-17–4-18, 10-38, 11-29
      12-17, 12-20, 12-28, 12-30, 13-1, 13-3, 13-4           suppression, 4-17, 5-1, 10-8, 10-38–10-39, 11-13
  monthly interview status variable, 1-8, 9-4, 9-5,        Group quarters, 8-6, 8-12, 9-6, 10-10, 11-14,
      9-11, 11-11, 12-6, 12-7, 12-8, 12-9–12-10,             11-15, 11-18, 12-15, 12-19, 12-20, C-19, E-3


                                                     Index-7
SIPP USERS’ GUIDE

Group quarters frame, 2-6                                       imputation, 10-37, C-16
Guardians, 10-15, 10-19, 11-12, 11-19, 11-21,                   interview status of members, 9-6, 11-9, 12-15,
  11-22, 12-21, 12-22                                              12-16
                                                                longitudinal analysis, 13-2
                                                                merging files to obtain, 9-6, 12-28, 13-22–13-23
Head of household, 2-8, 8-2                                     program unit identification, 9-7, 10-28
Health care                                                     reference person, 8-10–8-11, 8-12, 10-11, 10-12,
  costs/expenditures, 3-9, 3-12                                    10-15, 10-16–10-19, 11-6, 11-12, 11-16, 11-17,
  long-term, 3-10, 3-12                                            11-19–11-21, 12-17, 12-21, C-15
  utilization, 3-11, 3-12                                       size considerations, 8-5, 8-6, 9-5, 12-13, C-15
Health insurance coverage. See also                             tenure, 8-5, 8-6, C-2, C-16
  Medicaid; Medicare                                            topical modules, 3-7, 3-12
  child support arrangements, 3-15                              weighting adjustments, 12-13, C-2–C-3, C-15
  characteristics of, 10-26                                 Household composition. See also Additional
  data edits, 4-16                                            household members; Family
  errors in estimates, 6-4                                    calendar year weight and, 9-5
  ID variables, 9-14, 10-27, 10-29                            changes in, 2-10–2-14, 8-5, 8-10, 10-11, 10-20,
  information resources, 5-2, 5-3, 5-16                          10-23–10-24, 11-14, 11-22, 11-24–11-27,
  time-specific data, 2-4                                        12-16
  topical modules, 3-4, 3-8, 3-9–3-10, 3-11, 3-12,            core questions, 3-11
     3-13                                                     core wave files, 9-11, 9-12, 9-13, 9-15, 10-8,
  variables, 12-29                                               10-15–10-20, 10-23–10-24
Health status                                                 determining, 9-6
  children, 3-11                                              full panel files, 9-12, 9-13, 9-15, 12-15, 12-19,
  disability, 3-11, 3-15                                         12-21–12-22, 12-25, 12-26
  topical modules, 3-7, 3-9, 3-11                             ID variables, 9-6, 10-23–10-24, 12-15, 12-16,
Home-based employment, 3-6                                       12-25
                                                              identifying members, 2-6–2-7, 9-3, 9-6, 10-19,
Home health care, 3-11                                           11-12
Hospitalized persons, 2-16                                    interrelationships, 3-11–3-12, 9-6, 10-15, 10-16
Hot-deck matrix, 4-9–4-10, 4-11, 4-12, E-4                    and linking topical module files, 13-11–13-12
Hotel rooms, 2-6                                              longitudinal edits, 4-16
Household(s). See also Family                                 monthly, 9-6, 9-8
  defined, 2-6, 8-10, 9-6, 10-9, 12-15, E-4                   multigenerational family, 9-7, 10-12, 10-18,
  enhanced, C-14                                                 10-19, 11-21, 11-22, 12-22
  grouping of related primary families, 10-12                 number of families, 10-15
  identification, 9-6, 9-11, 10-9–10-11, 11-11,               reference period for, 11-14
     11-14, 11-15, 12-15–12-16                                relationship to reference person, 11-12, 12-21
  merged, 9-11, 9-12, 10-25, 10-26, 11-27, 12-28,             restrictions on analyses, 12-15
     13-16, 13-22–13-23, C-14, C-15, E-6                      rostering, 2-7, 2-16, 3-2
  number, by panel, 1-2, 2-2, 2-8, 8-20, 12-7                 temporarily absent members, 2-15–2-16
  recombined, 10-26, 11-27, 12-28, 13-22–13-23                topical modules, 9-6, 3-11, 10-15
  split, 2-11, 2-12, 2-14, 9-3, 10-12, 10-13–10-14,           variables, 4-16, 8-10, 9-11, 9-12, 9-13, 9-15, 10-8,
     10-20, 10-26, 11-18, 11-22, 11-24, 11-27,                   10-10, 10-15–10-20, 10-23–10-24, 11-19–
     12-23, 12-24–12-25, 12-28, 13-22                            11-21, 11-22, 12-15–12-16, 12-19, 12-21–
  types, 8-12, 10-15, C-3, C-6–C-8                               12-22
  weights, 8-2, 8-4–8-5, 8-6, 8-8, 8-10–8-12, 8-13,           weighting adjustments, 8-10–8-11, 8-18, 9-5,
     9-5, 9-8, 9-15                                              12-13, C-6
Household characteristics                                   Household Economic Studies, 1-13–1-14
  assigning to individuals, 13-2                            Household noninterview. See Household
  caregiver members, 3-11, 3-12                               nonresponse
  constructing, 9-8                                         Household nonresponse
  economic, 3-8, 5-2, 5-3, 7-5, 8-6, 9-5, 10-36,                adjustment factors, 8-5, C-2–C-3
     10-37, 11-28, 12-13, 12-37, 13-12, B-7, C-15,              defined, E-4
     C-16                                                       errors, 6-1–6-2


                                                      Index-8
                                                                                                          INDEX

  interview attempts at subsequent waves, 2-18                   cross-sectional, 4-4, 4-8–4-9
  rate calculations, 2-20                                        defined, E-5
  refusals, 11-8, C-15                                           dependent, 4-13
  sources of, 2-18, C-15                                         disadvantages, 4-3
  topical module files, 11-8                                     effect on analyses, 4-3, 4-11, 4-16, 7-6, 8-17,
  Type A, 2-18–2-20, C-2–C-3, E-13                                   13-6–13-7, 13-8, 13-12
  Type B, 2-18, E-13                                             EPPFLAG, 4-10, 4-13, 4-14, 10-36–10-37
  Type C, 2-18, E-13                                             error, 12-7, 13-7, 13-12, 13-14
  Type D, 2-18, 2-19, 2-20, E-12                                 exiting sample members, 13-17, 13-19–13-20
  by wave and panel, 2-19                                        flags, 4-11, 4-13–4-14, 4-15, 10-36–10-37, 11-28,
  weights, 2-20, 8-5, 8-6                                            12-37, 13-8, 13-12, 13-22, E-5
Housemates/roommates, 10-17, 11-20                               full panel files, 1-8, 4-3, 4-5, 4-14, 8-17, 9-15,
Housing                                                              10-37, 12-7, 12-10, 12-17, 12-37, 13-1, 13-8,
  conditions, 3-12                                                   13-11
  costs, 3-7, 3-8, 3-12, 3-14                                    goals of, 4-2–4-3, 4-11
  subsidized, 3-6                                                income, 4-4, 4-7, 4-9, 4-10, 4-15, 4-16, 10-37,
  units, 1-9, 2-6, 2-8–2-9, 2-16, 2-18, 9-3, 10-8,                   11-28, 12-37
     10-9–10-10, 11-13, 12-15, E-4                               item nonresponse, 2-21, 4-1, 4-2, 4-4, 4-7, 4-12,
                                                                     4-14, 6-1, 6-2, 7-6
                                                                 little Type Z, 4-10, 4-13, 10-37
ID variables. See also specific variables                        logical, see Data editing
  additional household members, 9-3, 10-8, 10-25                 longitudinal, 4-8, 4-16
  core wave files, 9-3, 9-12, 10-6–10-14, 10-20–                 and linking files, 4-5, 13-7, 13-8, 13-22
     10-28, 10-29–10-30, 11-11–11-12, 11-13,                     missing data, 1-8, 2-20, 2-21, 4-4, 7-6, 9-15,
     11-27, 13-9, 13-14                                              11-24, 13-20
  description, 9-2–9-4                                           missing wave, 4-5, 4-16, 8-7, 8-17, 9-5, 9-15,
  family, 9-12, 10-11–10-14, 11-17, 11-18, 12-18                     10-36, 12-7, 12-10, 12-17, 13-11, 13-16, 13-22
  family composition from, 9-6–9-7, 9-13, 10-11,                 nonmatches and, 13-17, 13-22
     10-12, 10-19, 11-17, 11-18                                  nonresponse adjustments, 2-20, 4-5, 8-17, 10-36,
  full panel files, 9-3, 9-12, 9-14, 12-6, 12-23–                    C-18
     12-28, 13-9, 13-15                                          person nonresponse adjustments, 1-8, 2-20, 4-1–
  household composition from, 9-6, 10-23–10-24                       4-2, 4-6–4-7, 7-6, 10-36, 11-11, 12-7, 12-13
  monthly characteristics from, 9-8                              personal demographic characteristics, 4-4, 4-6,
  mover identification, 9-3, 9-12, 10-8, 10-20,                      4-12, 4-16, 8-6, 11-11
     10-22–10-26, 11-13, 11-14, 11-21–11-27,                     program participation, 4-7, 10-28
     12-14, 12-23–12-28                                          redesign of 1996, 4-1, 4-5, 4-6, 4-7, 4-13, 4-15,
  names by file type, 9-2, 9-3                                       8-17, 12-37, 13-1
  person, 8-17, 9-4–9-8, 9-11, 10-7–10-9, 11-11,                 sample unit characteristics, 4-4, 4-6, 8-6
     11-13–11-15, 12-13–12-15, 13-23                             statistical, 4-1, 4-4, 4-8, 4-13
  purpose, 9-2–9-4                                               steps, 4-4
  topical module files, 9-3, 9-6, 11-7, 11-11–11-27,             topical modules, 4-2, 4-5, 4-14, 9-15, 11-11, 13-12
     13-11, 13-14, 13-15                                         Type Z, 1-8, 2-20, 4-2, 4-6–4-7, 4-13, 4-14, 7-6,
  transfer program unit composition from, 9-7                        8-5, 9-5, 12-7, 12-10, 12-13, 12-17, 13-8,
Immigration, 3-12–3-13, 8-5, C-8                                     13-12, E-13
Imputation. See also Sequential hot-deck                         variance estimation, 4-3, 4-11, 4-12, 4-16, 7-6
  imputation procedure                                           weighting adjustments, 8-4, 8-5
  additional household members’ records, 4-6–4-7,                whole record procedure, 13-11
     10-36                                                       within-wave, 13-11
  age, race, and gender, 10-37                               Income. See also Program income
  carryover procedures, 4-5, 4-10, 4-13, 4-16, 10-37,            amounts, 1-8, 3-6, 12-30
     E-9                                                         annual, 3-8, 8-18, 11-21
  core wave files, 4-2, 4-4, 4-6–4-7, 4-13, 8-16,                asset, 3-13, 4-7, 10-29, 12-37
     9-15, 10-6, 10-25, 10-36–10-37, 11-9, 12-10,                children’s, 3-6
     12-37, 13-1, 13-6–13-7                                      core questions, 1-8, 3-3–3-4, 3-6
  cross-observation, 12-37                                       core wave file structure, 13-7


                                                       Index-9
SIPP USERS’ GUIDE

  core wave file variables, 9-12, 10-19–10-20,                Inter-university Consortium for Political and
     10-21, 10-27, 10-37                                         Social Research (ICPSR), 1-5–1-6, 5-12
  CPS data, 1-1, 1-9, 1-10
  earned, 10-32–10-35, 12-37, B-1–B-4, B-7
                                                              Interview. See also Computer-assisted
  errors in estimates, 6-4                                       interviewing; Monthly interview status
  exiting sample members, 13-19, 13-20                           variable; Telephone interviews/
  family, 9-12, 10-19–10-20, 10-21, 10-35, 10-36,                interviewing
     12-23, 12-36, 12-37, C-18                                    additional household members, 2-16, 2-17
  full panel file variables, 9-12, 12-23, 12-30–12-31,            consistency checks, 2-17, 3-1
     12-32–12-37                                                  core questions, 3-1, 3-2–3-6, 6-2
  household, 7-5, 9-5, 10-35, 10-36, 10-37, 11-28,                dates, by panel, 2-2
     12-13, 12-36, 12-37, C-15                                    face-to-face, 2-17, 6-2
  imputation, 4-4, 4-7, 4-9, 4-10, 4-15, 4-16, 10-37,             household status code, 11-12
     11-28, 12-37, 13-19                                          identifying household members, 2-6–2-7, 2-16
  information resources, 5-2, 5-3, 5-16                           intervals, 1-4, 2-1, 2-9, 8-8
  monthly, 12-31, 12-36                                           mode, by wave, 6-2
  nonresponse, 6-2                                                month, E-5
  property, 3-12, 6-4                                             probes, 3-3
  PSID data, 1-10–1-11                                            procedures, 1-4, 2-16–2-17, 2-21, 3-1–3-2, 6-2,
  subfamily, 12-23                                                   8-19
  subpopulation variables, 11-28                                  skip patterns, 2-17, 3-2, 10-2, 10-6, 11-2, 11-6,
  summary variables, 10-29, 10-35–10-36, 12-36                       12-2, 12-3, 12-6, E-11
  taxes, 3-8, 3-14                                                telephone. See Telephone interviews/interviewing
  topcoding, 4-17, 9-15, 10-29, 10-32–10-36, 11-28,               topical questions, 3-1, 3-6–3-16
     12-31, 12-36–12-37, B-1–B-4, B-6–B-7                     Interview month weights
  topical modules, 3-8, 3-12                                      calendar month estimation, 8-14, 8-15
  types recorded in SIPP, 3-3–3-4, 3-5, 11-21                     core wave file, 8-8–8-11, 8-14, 8-15
  unearned, 3-3–3-4, 3-5, 3-6, 10-29, 10-32, 11-28,               construction, 8-4–8-5, 8-6
     12-30, 12-32–12-36, 12-37, B-6–B-7                           format, 8-8–8-9
  unreported, 13-19                                               household-level analyses, 8-10–8-11
  variables, 9-12, 12-23, 12-30–12-31, 12-32–12-36                person-level analyses, 8-9–8-10, 8-16, 11-28
  weighting adjustments, 13-19                                    population represented by, 8-9, 8-10, 8-14
Income Survey Development Program                                 topical module file, 8-16, 9-8, 11-28
   (ISDP), 1-1–1-2, 1-13                                          by type of file, 8-3
Infants, 8-17, 9-5, 9-8, 10-25, 11-24, 12-26, 13-16,              uses, 8-8–8-11
  13-17                                                       Interviewer
Information resources. See also Microdata                       discretion in identifying reference person, 10-18,
   files; Technical documentation; Web sites                       11-20
  bibliography (online), 1-13, 5-15                             errors, 4-2
  directory of data and publications, 5-15                      experience, 8-19
  P-70 series, 1-13–1-14, 5-1, 5-2–5-3                        INTVW field, 4-13–4-14
  Quality Profile, 1-13, 5-1, 5-13                            Item nonresponse
  telephone numbers, 5-16                                        data editing, 4-1
  User Notes, 5-12, 5-14, 10-2, 11-2, 12-2                       defined, E-5
  variable metadata, 5-15                                        errors, 6-1, 6-2
  working papers, 1-14, 5-13, 5-14, 5-15                         imputation, 2-21, 4-1, 4-2, 4-4, 4-7, 4-12, 4-14,
Institutionalized individuals 2-6, 2-9, 2-15,                       6-2, 7-6
  2-16, 8-7, 8-18, 11-11, 13-16, 13-17, 13-20                    rates, 6-2
Instrumental Activities of Daily Living                          sources, 2-20–2-21, 4-2
   (IADL) battery, 3-10                                       Iterative proportional fitting, C-5
Interest income, 10-29
Internal data files, 1-5, 5-1                                 Jackknife repeated replications, 7-2


                                                       Index-10
                                                                                                      INDEX

Labor force status. See also Employment;                   Loss of sample. See also Attrition
  Unemployment; Work                                        reasons for, 13-16, 13-17, 13-18–13-19, C-15
  core questions, 3-3, 3-4                                  rates, 2-17–2-18, 2-19
  errors in estimates, 6-4                                 Marital history, 3-12, 8-18, 8-19
  imputation, 4-4, 4-7, 4-8–4-10, 4-14, 10-36–10-37        Marital status, 11-11, 11-12, 11-19
  information resources, 5-3, 5-16                         Marriages, 2-11, 5-16, 6-4, 11-24, 11-27, 12-26
  noninterview adjustments, C-18
  spell estimation, 8-18
                                                           Mean, defined, 7-5
  and topcoding, 10-32, 10-33, B-3, B-4                    Measurement errors, 6-2–6-3, 13-12
  weekly data, 2-3                                         Medicaid, 3-4, 9-7, 9-14, 10-27, 10-29, 10-30–
Liabilities                                                 10-31, 12-29, 12-30, 12-31
  errors in estimates, 6-4                                 Medical expenses, 3-12
  topical questions, 3-6, 3-8                              Medicare, 3-4, 9-7, 9-14, 10-27, 10-28, 12-29,
Linking files or data. See also Merging files               12-30, 12-31
  or data                                                  Merging files or data. See also Linking files
  across waves, 13-7, 13-12, 13-16                          or data
  bias in analyses from, 13-1–13-2                          aggregate records, 13-13
  conceptual issues, 1-9                                    attrition and, 13-16, 13-17, 13-20–13-21
  core data from all waves, 4-3                             calendar month estimates, 8-14–8-16, 8-19
  core wave file reformatting, 13-3–13-4, 13-5–13-6         core wave with full panel, 10-6, 12-1, 12-6, 12-17,
  core wave to full panel, 1-9, 12-28, 13-8–13-11               12-20, 12-28, 12-30, 13-1, 13-3, 13-4
  editing/imputation effects, 4-5, 13-7, 13-8               core wave with topical module, 1-8, 3-10, 9-6,
  format changes for, 13-3–13-4, 13-5–13-6                      9-9, 10-6, 11-1, 11-7, 11-8, 11-10, 11-11,
  households or families, 13-1–13-2, 13-11–13-12                11-13, 11-17, 11-19, 12-6, 12-13, 13-1, 13-3,
  husbands and wives, 10-6, 12-13                               13-4, 13-12, 13-13, 13-14, 13-15
  multiple core wave files, 4-5, 5-4, 13-4, 13-6–13-8       duplicated records, 13-23
  multiple topical module files, 13-1, 13-11–13-12          for family membership identification, 9-6, 11-13,
  overview, 1-9                                                 11-17, 12-17, 12-20
  parents and children, 10-6, 12-13                         format of output, 13-2, 13-3
  procedures, 13-2–13-15                                    households in pre-1996 panels, 9-6, 12-28, 13-22–
  reasons for, 5-4, 9-9, 12-13, 13-1, 13-4                      13-23
  topical module to core wave, 1-9, 13-12–13-14             imputation and, 1-8
  topical module to full panel, 1-9, 13-14–13-15            multiple core wave files, 10-1, 10-6, 12-13
  unit composition changes and, 13-1–13-2                   multiple topical module files, 11-13
  within waves, 13-7, 13-16                                 nonmatches in, 1-8, 13-12, 13-14, 13-15–13-23
Linking records across microdata files, 9-4,                people exiting or entering the population and,
  10-7, 11-13, 11-16, 12-13                                     13-17–13-20
Living conditions, topical modules, 3-7                     person indentification and, 10-6–10-7, 12-13
Longitudinal analyses                                       procedures, 10-1, 11-1
  of core wave data, 13-6–13-7, 13-8                        program coverage, 12-30
  defined, E-6                                              quarterly estimates, 8-14–8-16
  editing, 4-1                                              reasons for, 8-14–8-16, 9-9, 13-1
  household or family charactistics, 13-2                   redesign of SIPP and, 13-22
  imputation effects, 7-6, 8-17, 13-6–13-7                  topical module with full panel, 9-6, 10-6, 11-1,
  quarterly estimates, 8-16                                     11-7, 11-13, 11-19, 12-1, 12-6, 13-12
  restrictions on, 9-5, 12-9–12-10, 12-15, 12-16,           types, 13-2–13-3
     13-2, 13-6–13-7                                        variables from different files, 11-11, 11-19, 13-4
  seam effect and, 6-3                                      weights, 5-4, 13-1, 13-12
  weights, 8-3, 8-4, 8-16, 12-7                             within core wave files, 1-9, 12-13, 13-3–13-4,
                                                                13-5–13-6, 13-7
Longitudinal research files. See Full panel
                                                           Methodology, information resources, 5-16, 6-3
  files
                                                           Metropolitan area identification, 4-17–4-18,
Long record format, 13-2                                    9-15, 10-38–10-39, 12-38
Long-term care, 3-9, 3-12                                  Metropolitan Statistical Areas (MSAs), 10-39


                                                    Index-11
SIPP USERS’ GUIDE

Microdata files. See also Core wave files;                 Monthly
 Full panel files; Topical module files                        cross-sectional weights, 5-4
 confidentiality procedures, 1-5, 4-4, 4-5, 4-17–              employment income, 10-32–10-35
    4-18, 7-2, 10-6, 10-8, 11-13, 12-14                        family composition, 9-6–9-7, 9-8, 12-17–12-18,
 construction of variables, 9-8                                   12-20
 contents, 5-3–5-4, 5-6–5-11                                   household composition, 9-6, 9-8
 creation, 4-4, 4-5                                            program income variables, 12-30, 12-36, 12-37
 defined, E-6                                                  transfer program unit composition, 9-7, 9-8
 differences among types, 9-10, 9-11–9-15, 11-8,               variables, 9-3–9-4, 9-8
    11-11–11-12                                            Monthly interview status variable
 extracts from, 5-13                                           core wave files, 9-4, 9-5, 9-11, 11-9, 11-11, 11-12
 formats, 5-3–5-5, 5-11, 5-12                                  defined, E-6
 ID variables, 9-2–9-4                                         full panel files, 1-8, 9-4, 9-5, 9-11, 11-11, 12-6,
 monthly family composition, 9-6–9-7                              12-7, 12-8, 12-9–12-10, 12-11–12-12, 12-13,
 monthly household composition, 9-6                               12-15, 12-16, 12-18, 12-20, 12-23, 12-29
 monthly interview status variable, 9-4–9-5                    name, by file type, 9-4, 11-11, 12-15
 monthly transfer program unit composition, 9-7                noninterview code, 9-5
 multiple file usage, 9-9                                      number of occurrences, 12-6, 12-9
 person identification, 9-4–9-8                                person-level, 11-9–11-11, 11-12, 12-16
 sources for obtaining, 5-1, 5-3, 5-4, 5-12–5-13               program participation, 12-29
 technical documentation, 1-14, 5-12, 5-14                     purpose, 9-4, 9-11, 11-9, 12-9
 types, 1-8, 5-3, 9-1–9-2, 9-11                                realigned by calendar month, 12-11–12-12
 User Notes, 5-12, 5-14, 12-2                                  restrictions on use, 9-5, 12-9–12-10
 variable metadata, 5-15                                       topical module files, 9-4–9-5, 9-11, 11-9–11-11,
 website, 1-6                                                     11-12
 weight selection, 9-8                                         values, 9-5, 11-9, 11-10, 12-9–12-10
Migration history, 3-12–3-13, 5-16                         Mothers, 10-15
Military barracks                                          Moves/movers. See also Following rules
  original sample members in, 2-9, 2-10, 2-11, 2-15,           abroad, 2-9, 2-15, 10-25, 11-24, 12-26, 13-16,
     10-25, 11-24, 12-25–12-26, 13-16, 13-17                      13-17, 13-20
Missing data                                                   additional household members, 4-6–4-7, 8-6, 10-8,
  adjustments for, see Data editing; Sequential                   10-20, 11-24, 12-24–12-25
     hot-deck procedures                                       defined, E-6
  code for linking files, 13-3, 13-4                           distance considerations, 2-15, 2-20, C-15
  defined, E-6                                                 identification, 9-3, 9-12, 10-8, 10-20, 10-22–
  flagging, 11-9, 12-10                                           10-26, 11-13, 11-14, 11-21–11-27, 12-14,
  imputation, 1-8, 2-20, 2-21, 4-4, 7-6, 9-15, 11-24,             12-23–12-28
     13-20                                                     interview procedures, 1-4, 2-17
  model-based approaches, 13-22                                nonmatches in merged files, 13-16, 13-17, 13-20
  panel weights, 8-17, 13-22                                   nonresponse, 2-17, 2-20
  problems caused by, 4-2                                      patterns of, 5-3
  selection of replacement values, 4-8, 4-13, 4-15             person identification and, 9-11, 9-12, 10-6, 11-14,
  statistical packages, 13-21                                     12-14, 13-23
  substituting the mean for, 13-20–13-21                       temporarily absent members distinguished from,
  topical modules, 4-5, 5-4                                       2-15–2-16
  types of, 4-1–4-2                                            tracing, 2-9, 2-15, 2-16
  weighting adjustments, 13-21, 13-22                          weighting adjustments, 8-4, 8-5, 8-6, 13-20,
Missing waves                                                     C-13–C-15, C-16, C-19
  defined, E-6                                             MSA-Place Status, 8-5
  full panel files, 12-10, 13-22                           Multiple files
  imputation, 4-5, 4-16, 8-7, 8-17, 9-5, 9-15, 10-36,          reasons for working with, 9-9
     12-7, 12-10, 12-17, 13-11, 13-16, 13-17, 13-22        Multivariate statistics, 13-20–13-21
  weighting adjustments, 8-7, 13-22


                                                    Index-12
                                                                                                            INDEX

National Center for Health Statistics                             marriage, 2-11
  (NCHS), 6-4                                                     merged households, 10-25
                                                                  in military barracks, 2-9, 2-10, 2-11, 2-15, 10-25
National Longitudinal Survey (NLS), E-7                           moves, 9-3, 10-22, C-13
National Research Council, Committee on                           noninterview rates, 6-2
  National Statistics, 1-2                                        number, by panel, 2-2
New-construction frame, 2-6                                       person numbers, 10-8, 10-9, 10-20, 11-14, 12-14
New construction noninterview adjustment                          reentering sample universe, 13-16, 13-17
                                                                  separation/divorce, 2-14
  factor, C-1, C-12                                               temporarily absent, 2-15–2-16
Noninterviews. See also Household                                 weights for, 8-6, 8-7
  noninterviews; Person nonresponse                           Oversampling
  adjustment factors, C-1, C-2–C-3, C-12, C-13,                   defined, 2-8, E-7
     C-18–C-19                                                    1990 panel, 2-8, 8-2
  departure, E-2                                                  1996 panel, 1-3, 2-8–2-9
  monthly interview status variable code, 9-5                     rate, 2-9
  person-level, 1-8, 4-6–4-7, 9-5, 11-11
  Type D, 2-15
  Type Z, 4-1–4-2, 4-14, 11-9, 12-13, 13-8, 13-11,
                                                              P-70 series reports, 1-13–1-14, 5-1, 5-2–5-3, E-7
     13-12                                                    Panel files. See Full-panel files;
Nonresponse. See also Household                                 Partial-panel files
  nonresponse; Item nonresponse; Person                       Panel Study of Income Dynamics (PSID),
  nonresponse                                                     1-10–1-11, E-8
  bias, 2-17, 4-2, 6-1                                        Panel weights, 8-16–8-17, 8-18–8-19
  movers, 2-17, 2-20                                          Panels
  imputation adjustments, 2-20, 4-5, 8-17, 10-36                attrition by, 2-19
  nonsampling error, 6-1–6-2                                    composition, 2-8–2-9
  and quality of data, 2-18                                     core content differences, 3-3–3-6
  rates, 2-17–2-18, 2-20, 4-3, 6-2                              date of interview by, 2-2
  refusals, 2-17, 2-18, 2-20, 4-2, 4-7, 10-36, 12-13            defined, 2-1, E-7
  subpopulations, 6-4                                           followup to 1992 and 1993, 1-11, 2-2
  unit, 4-1, 4-3, 4-4                                           household number by, 1-2, 2-2, 2-8, 8-20, 12-7
  wave, 4-5, 7-6                                                length of, 2-1–2-2, 8-16, 8-19
  weighting adjustments, 2-17, 2-18, 4-1, 6-2, 6-4,             nonresponse by, 2-19, E-8
     8-4, 8-5, 8-6, 8-8, C-3                                    number of waves by, 2-2, 12-6, 12-7
Nonsampling errors                                              organizing principles, 2-1–2-3
  effects on survey estimates, 6-3–6-4, 8-19                    original sample members in Wave 1 by, 2-2
  information resources, 5-13, 5-16                             overlapping, 1-3, 2-1, 8-19, 8-20, 9-9
  measurement errors, 6-2–6-3                                   oversampling, 1-3, 2-8–2-9
  nonresponse, 6-1–6-2                                          pooling data from, 8-19–8-21
  and pooling data, 8-19                                        structure, 1-2, 1-3, 2-1, 12-6, 12-7
  recall period and, 8-18                                       topical modules by, 3-7, 3-8–3-15, 5-4, 5-6–5-11,
  sources, 1-6–1-7, 6-1                                             11-6
  undercoverage of subpopulations, 1-6, 6-1                     variance units and strata by, 7-2–7-3
Nursing homes, 2-16, 3-14, 8-18, 13-20                          weights, 8-16–8-17, 8-18–8-19, C-17–C-25
                                                              Parents, 10-7, 10-15, 10-17, 10-18, 10-19, 11-12,
                                                                11-13, 11-16, 11-19, 11-20, 11-21, 11-22, 12-13,
Old-Age, Survivors, and Disability                              12-21, 12-22
  Insurance (OASDI), 7-4                                      Partial panel files, 5-12, 9-3, E-8
Original sample members                                       Person. See also Reference person
  age, 2-7                                                        associated sample, C-13, C-14
  births to, 2-14                                                 monthly interview status variable, 11-9–11-11,
  defined, E-7                                                       11-12, 12-16
  following rules, 1-4, 2-7, 2-9–2-15, 10-25, 11-24,              noninterview records, 1-8, 4-6–4-7, 9-5, 11-11
     13-15                                                        out of scope, 12-13


                                                       Index-13
SIPP USERS’ GUIDE

Person identification. See also Person                          reference person, 10-16
  Number                                                        sorting files for linking, 13-3, 13-4, 13-9, 13-14,
  core wave files, 9-11, 9-15, 10-6–10-9, 11-11,                   13-15
     13-9, 13-23                                                spouses, parents, and guardians, 12-21, 12-22
  examples, 11-14, 11-15                                        topical module files, 11-7, 11-10, 11-11, 11-12,
  full panel file, 8-17, 9-11, 9-15, 12-13–12-15,                  11-13, 11-14, 11-15, 11-16, 11-18, 11-19,
     13-23                                                         11-21, 11-22, 11-24, 11-25–11-26, 11-27
  and merging files or data, 10-6–10-7, 12-13, 13-23            transfer program recipient, 10-28
  moves and, 9-11, 9-12, 10-6, 11-14, 12-14, 13-23              variable names, 9-3
  reasons for, 10-6–10-7, 12-13                                 by wave, 10-8–10-9, 12-14
  topical module files, 9-11, 9-15, 11-11, 11-13–           Person-record
     11-15, 13-23                                               duplicates, 13-23
  variables, 8-17, 9-4–9-8, 9-11, 10-7–10-9, 11-11,             format, 9-4, 9-5, 9-7, 9-11, 10-6, 10-7, 13-2, 13-3–
     11-13–11-15, 12-13–12-15, 13-23                               13-4, 13-5–13-6, 13-7, 13-9, 13-13
Person-month                                                Person weights
  format, 1-8, 5-4, 5-5, 8-8, 9-1, 9-3, 9-5, 9-6, 9-11,       adjustments, C-5
     10-6, 10-7, 10-25, 11-7, 13-2, 13-3–13-4, 13-5–          base, C-2
     13-6, 13-7, 13-9, 13-13, 13-15, E-8                      construction, 8-4–8-5
  record, 8-8, 8-15                                           cross-sectional, 8-16, 11-28
Person nonresponse (Type Z)                                   final, 8-2, 8-3, 8-4
  core questions, 4-2, 13-22                                  full panel file, 8-3, 8-17
  defined, E-8, E-12                                          household, family, subfamily weights from, 8-6,
  errors, 6-1, 6-2                                                8-10, 8-11, 8-12
  forms of, 2-20                                              husbands and wives, 8-10
  imputation adjustments, 1-8, 2-20, 4-1–4-2, 4-6–            initial, 8-5
     4-7, 7-6, 10-36, 11-11, 12-7, 12-13, 13-22               interview month, 8-8, 8-9–8-10, 8-16, 11-28
  rates, 6-2                                                  population represented by, 8-16
  sources of, 2-15, 2-18, 2-20, 4-1–4-2, 12-13                reference month, 8-8–8-12, 8-16
                                                              topical module files, 11-11, 11-12, 11-28
Person Number                                                 by type of file, 8-3, 9-15, 11-11, 11-12
  additional household members, 10-25, 11-14,                 variable name, 11-12
     11-24                                                    zero, 9-5, 9-8
  changes in, 10-26, 11-27, 12-14, 12-26, 13-22
  core wave files, 1-8, 9-3, 10-6, 10-7, 10-8, 10-9,
                                                            Personal demographic characteristics, 3-2
     10-10, 10-13–10-14, 10-15, 10-21, 10-22,                 editing, 13-8
     10-28, 11-11, 11-12, 11-23, 13-3, 13-7                   imputation, 4-4, 4-6, 4-12, 4-16, 8-6, 11-11
  components, 9-4, 10-6, 11-14, 12-14                       Personal history topical module, 3-6, 3-7, 3-15
  family identification, 10-13–10-14, 10-21, 11-18,         Personal Responsibility and Work
     12-20, 12-23                                             Opportunity Reconciliation Act
  family-level income, 12-23                                  (PRWORA), 1-3, 9-7, 10-27
  full panel files, 1-8, 12-7, 12-8, 12-11–12-12,
     12-14, 12-15, 12-16, 12-20, 12-23–12-27,
                                                            Perturbation factors, 7-3
     12-37                                                  Pooling data
  household composition, 10-10, 10-15, 10-16,                   family-level income, 10-20
     10-19, 10-23–10-24, 11-16, 11-19, 11-21,                   from multiple panels, 8-19–8-21
     11-22, 12-16                                               from multiple waves, 8-15
  income topcodes, 10-36, 12-37                                 nonsampling errors and, 8-19
  merged households, 10-25, 13-22                               reasons for, 9-9
  movers, 10-20, 10-22, 10-23–10-24, 10-25, 11-14,          Population control adjustments, 1-6, 6-1, C-3–
     11-22, 11-23, 11-25–11-26, 12-23–12-27                     C-4
  multigeneration household members, 11-21, 11-22           Population mean, 7-5
  newborns, 11-24, 12-26                                    Population variance, 7-5
  original sample members, 10-8, 10-9, 10-20,
     10-25, 11-14, 12-14
                                                            Post Enumeration Surveys, 2-6
  purpose, 9-4, 11-14                                       Poststratification adjustment, 8-4
  recombined households, 10-26


                                                     Index-14
                                                                                                          INDEX

Poverty status                                                 coverage, 4-16, 9-14, 10-26–10-28, 10-29, 10-30–
  CPS estimates, 1-9, 6-4                                         10-31, 12-28, 12-30–12-31
  determining, 2-8–2-9                                         defined, E-9
  errors in estimates, 6-4                                     examples, 10-30–10-31
  information resources, 5-2, 5-3, 5-16                        full panel files, 9-14, 12-28–12-30
  SPD estimates, 1-11                                          identification, 9-14, 12-28–12-30
  weights, 8-5, 8-6, C-2, C-18                                 longitudinal household problem, 13-2
Primary individuals, 8-11, 8-12, 9-4, 9-6, 10-11,          Property. See also Real estate ownership;
  11-17, 11-18, 12-17, 12-19, 12-20, E-8                     Vehicle ownership
Primary recipient ID, 9-8, 9-14                              income, 3-13, 6-4
Primary sampling units (PSUs)                                taxes, 3-12, 3-13
  address selection, 2-6                                     topcoding, 11-28, B-6
  defined, E-8                                             Proxy respondents, 2-10, 2-16, 3-1, 6-2, 10-6,
  imputation role, 4-11                                      10-25, 11-24, E-9
  moves 100+ miles from, 2-15                              Pseudo-families, 9-6, 10-11, 10-15, 11-17, 12-17
  non-self-representing, 2-5, C-12, E-7                    Public use files, E-9. See also Microdata files
  person identification, 10-8, 11-13, 12-14
  selection of, 2-6, 7-2
  self-representing, 2-5, E-11
                                                           Quality Profile, 1-6, 1-13, 2-5, 2-8, 2-18, 5-1,
  variance estimation role, 7-1, 7-2                           5-13, 6-3
  with-replacement assumption, 7-2                         Quality of data
Program income                                               accuracy of definitions in data definitions, 11-6
  authorized recipient, 10-7, 10-27, 10-28, 12-29            CAI and, 1-3, 3-1, 6-2, 8-16
  core questions, 3-3, 3-5                                   interview consistency checks, 2-17, 3-1
  errors in, 6-4                                             matched records containing imputed data, 1-9
  monthly, 12-30, 12-36, 12-37                               nonresponse and, 2-18
  person-level amount, 9-14                                Quarterly estimates, 8-14–8-16
  recipient for family, 10-7, 10-27, 10-28, 12-13          Questionnaires. See also Computer-assisted
  topcodes, 10-36                                            interviewing
  variables, 9-14, 10-27, 12-30, 12-31, 12-32–12-36,           core items, 2-3, 3-1, 3-2–3-6
     12-37                                                     correspondence of variables to items on, 10-4–
  weighting adjustments, C-18                                     10-6, 11-6, 12-5–12-6
Program participation                                          data dictionary correspondence to, 10-4–10-6,
  administrative records compared to responses, 6-3               11-6, 12-5–12-6
  core questions, 1-8, 3-3, 3-4, 3-5, 3-6                      design, 5-16, 8-19
  CPS data, 1-9                                                documentation, 5-14, 11-2
  disability and, 3-10                                         edits, 2-17, 4-6
  economics of, 5-3; see also Program income                   paper instrument, 2-17, 3-1, 3-2, 4-6, 4-15, 8-6,
  eligibility, 3-9, 3-15, 10-38, 11-29, 12-38                     10-2, 10-6, 11-2, 12-2
  imputation, 4-7, 10-28                                       rostering, 2-7, 3-2
  primary recipient ID, 9-8, 9-14                              screens, 5-14
  P-70 publications, 5-2, 5-3
  recipiency history, 3-13, 3-15, 8-18, 10-26, 10-27       Race/ethnic origin
  recipient characteristics, 5-2                             imputation, 10-37
  SPD data, 1-11                                             income topcoding, 10-32, 10-33, B-2–B-3, B-4
  spell estimation, 8-18, 12-7                               reference person, 8-5, C-2
  variables describing, 9-14, 10-27, 12-29, 12-31–           variable name, 11-12
     12-36                                                   weighting, 8-5, 8-6, C-3–C-4
  weights, 9-5, 12-13
                                                           Railroad Retirement, 3-5, 6-4, 9-7, 9-14, 10-27,
Program units                                                10-28, 12-29
  composition, 9-7, 9-8
  constructing characteristics of, 9-8
                                                           Raking procedure, 8-5, C-4, C-5, C-10, C-11,
  core wave files, 9-14, 10-26–10-29, 10-30–10-31            C-12, C-24
                                                           Real estate ownership, 3-3, 3-8, 3-12, 11-28


                                                    Index-15
SIPP USERS’ GUIDE

Recall, 1-6, 1-9, 2-3, 6-2, 8-18                               length of, 1-2, 2-3, 2-4–2-5
Record Check Studies, 6-3–6-4                                  organizing principles, 2-3–2-4
                                                               by panel, 12-7
Redesign (1996) of SIPP                                        and recall errors, 2-3
  address clusters, 2-6                                        by rotation group, 2-4–2-5, 10-2, 11-2, 11-10,
  confidentiality procedures, 4-17–4-18, 10-6, 10-38              12-9, 12-10, 12-11–12-12
  core content, 3-3–3-4                                        topical modules, 3-7, 11-8, 11-10, 11-11, 11-19,
  data dictionaries, 12-3                                         11-21, 13-13
  defined, E-9                                                 weighting adjustments for pooled data by, 8-21
  editing and imputation procedures, 4-1, 4-5, 4-6,
     4-7, 4-13, 4-15, 8-17, 12-37, 13-1
                                                           Reference person
  entry address ID, 9-4, 10-7, 10-8, 10-9, 11-13,            changes in, 8-10, 10-18, 12-21
     12-13, 13-3                                             defined, 3-11, 10-16, 11-20, E-9
  full panel files, 4-16, 9-3, 9-11–9-15, 13-1               family, 3-11, 8-11–8-12, 9-6, 10-11, 10-12, 10-15,
  household characteristics, 8-6, 10-10, 11-14,                 10-16
     11-16                                                   group quarters, 8-12
  interview procedures, 2-17, 3-1, 8-6, 8-16                 household, 8-10–8-11, 8-12, 10-11, 10-12, 10-15,
  and merging files, 13-22                                      10-16–10-19, 11-6, 11-12, 11-16, 11-17,
  monthly interview status code, 9-5                            11-19–11-21, 12-17, 12-21
  overview, 1-2–1-3                                          identification of, 2-16, 10-16
  panel structure, 1-2, 2-1, 2-2, 8-16                       interviewer discretion in identifying, 10-18, 11-20
  program unit IDs, 10-28                                    nonfamily household, 8-12
  questionnaires, 10-5                                       primary individual, 10-11, 11-17
  rotation groups, 2-4–2-5                                   proxy interviews with, 2-16, 3-1
  state identification, 11-29                                race, 8-5, C-2, C-15
  topcoding, 10-29, 10-32–10-35, 12-31, B-1–B-2              relationships of household members to, 8-10–
  topical module files, 3-10, 5-4, 9-5, 11-6, 11-7,             8-11, 10-11, 10-15, 10-16–10-19, 11-12,
     11-8, 11-9, 11-11, 11-17, 11-29                            11-19–11-21, 12-17, 12-21, 12-22
  variable names, 8-1, 9-1, 9-3, 10-1, 10-5, 10-6,           topical questions, 3-7, 3-8
     11-1, 13-1, 13-2, A-10–A-17                             two people designated as, 11-21
  weights, 7-3, 8-1, 8-3, 8-5, 8-6, 8-9, 8-16, 12-37,        unmarried partner of, 10-17, 11-20
     C-1, C-2–C-3                                            variable name, 10-16
                                                             weights, 8-6, 8-10, 8-11, C-2, C-15, C-16
Reference month weights
  calendar month estimation, 8-14, 8-15
                                                           Replicability of published estimates, 5-1
  construction, 8-4–8-6                                    Reservation wage, 3-13
  core wave files, 8-3, 8-4–8-5, 8-6, 8-8–8-13, 8-14,      Respondents. See also Reference person
     8-15, 10-37                                               absent for consecutive waves, 4-5, 4-16, 7-6
  family-level analyses, 8-11–8-12, 8-13                       age, 1-2, 2-7, 2-16, 3-1, 3-6, 3-7, 3-9, 3-10, 11-6,
  format, 8-8–8-9                                                 11-10
  household-level analyses, 8-10–8-11                          burden on, 2-3
  number per person, 8-8                                       “donors,” 1-5, 2-20, 4-1, 4-3, 4-7, 4-9, 4-10, 4-13,
  person-level analyses, 8-8, 8-9–8-10                            10-37
  population represented by, 8-10                              misinterpretation of questions, 6-3
  second-stage calibration adjustment, 8-6, C-16–              proxy, 2-10, 2-16, 3-1, 6-2, 10-6, 10-25, 11-24
     C-17                                                      referral to records, 3-3, 3-14, 6-3
  subfamily-level analyses, 8-11–8-12, 8-13                    in scope, 8-5, 8-7, 8-16, 9-8, 11-9, E-5
  variable, 8-8–8-9                                            topical modules, 3-7, 11-6, 11-10
Reference period                                           Responses
  aligned to calendar months, 12-7, 12-9, 12-10,             administrative records compared to, 6-3–6-4
     12-11–12-12                                             error sources, 1-6–1-7, 6-3
  core wave files, 9-2, 10-7, 11-8, 13-4, 13-7             Retirement expectations, 3-13
  CPS, 1-9                                                 Retirement/pension accounts, 3-3, 3-5, 3-7, 3-8,
  cross-walk, 10-2, 11-2, 12-2                               3-13–3-14, 5-2, 5-16, 11-21
  defined, 2-1, 2-3, E-9
  for household composition, 11-14
                                                           Roomers/boarders, 10-17, 11-20
  interview month used in estimates with, 8-9              Rostering, 2-7, 2-16, 3-2


                                                    Index-16
                                                                                                             INDEX

Rotation group, 1-2                                               topical module files, 9-3, 11-7, 11-10, 11-11,
  calendar month estimation by, 8-12, 8-14, 8-15,                    11-12, 11-13, 11-14, 11-15, 11-17, 11-18,
     9-9                                                             11-25–11-26, 11-27
  defined, 2-1, 2-3, E-9                                          transfer program unit composition, 9-8, 10-28
  format, 2-3, 8-8, 10-7                                          variable names, 8-1, 9-1, 9-3, 10-1, 10-10, 11-1,
  and nonsampling errors, 6-2, 6-3                                   11-11, 12-15, 13-2
  quarterly estimates by, 8-15                                    by wave, 10-9
  reference period by, 2-4–2-5, 10-2, 11-2, 11-10,            Sample units. See also Primary sampling
     12-9, 12-10, 12-11–12-12                                   units
  skipped, 2-3                                                    imputation of characteristics, 4-4, 4-6, 8-6
  variable, 11-10, 11-11, 11-12                                   merged, 10-25, 10-26, 11-27, 12-26
  weights, 8-5, 8-8, 8-12, 8-14, 8-16, C-16                       selection of, 2-5–2-7
Rural addresses, 2-6                                          Sampling errors
                                                                bias in estimates of, 1-7, 2-5
Sample design                                                   direct variance estimation, 7-1–7-3
  comparison of surveys, 1-10                                   GVFs, 7-4–7-6
  oversampling, 2-8–2-9                                         imputation and, 7-6
  selection of sampling units, 2-5–2-7                          information resources, 5-13, 5-16
  and variance estimates, 7-1                                   magnitude of, 7-4
Sample population                                               nonresponse and, 6-2
  comparison with other surveys, 1-9, 1-10                      survey design considerations, 7-1
  entries and exits, 13-17–13-20. See also Attrition          SAS reformatting code, 13-3–13-4, 13-5–13-6,
  size considerations, 1-2, 1-3, 2-2, 6-2, 8-5, 9-9,            13-9, 13-10
     12-7, C-19                                               SAS syntax, 10-4, 10-5, 11-4, 11-5, 12-3, 12-5
  universe, 13-17                                             School. See also Education and training
Sample Unit IDs                                                 enrollment, 3-4, 3-14
  additional household members, 9-3, 10-8, 10-9,                lunch program participation, 3-4, 3-6
     11-13, 12-14                                             Seam effect, 1-6–1-7, 4-16, 6-3, 6-4, 8-16, 8-19,
  changes in, 10-26, 11-13, 11-27, 12-14, 12-26                 E-9
  components, 9-2, 11-13                                      Secondary individuals, 8-11–8-12, 9-6, 10-11,
  core wave files, 9-3, 10-7, 10-8, 10-9, 10-10,                11-17, 11-18, 12-17, E-9
     10-11, 10-13–10-14, 10-21, 10-22, 10-23–
     10-24, 11-11, 11-12, 11-13, 11-23, 13-3, 13-7,
                                                              Secondary sample members, 9-3, 9-4, 11-10,
     13-9                                                       13-15–13-16, 13-17, E-9
  family identification, 10-11, 10-13–10-14, 10-21,           Security, of telephone interviews, 2-17
     11-17, 11-18, 12-18, 12-20, 12-23                        Self-employment, 3-3, 3-4, 3-6, 4-7, 10-32, C-18
  family-level income, 12-23                                  Sequential hot-deck imputation procedure
  full panel files, 9-3, 12-7, 12-8, 12-11–12-12,                 allocation flags, 4-11, 4-13–4-14
     12-14, 12-15, 12-16, 12-18, 12-20, 12-23–                    classes/adjustment cells, 4-8, 4-9–4-10, 4-12
     12-28, 12-29, 13-9                                           cold-deck values, 4-8, 4-11–4-12
  household composition, 9-6, 10-10, 10-23–10-24,                 core wave data, 4-4, 11-9
     11-14, 11-16, 11-25–11-26, 12-15, 12-16,                     cross-sectional, 4-8, 4-9
     12-25, 12-26                                                 data editing compared, 4-8
  merged households, 12-28                                        donors, 4-1, 4-8, 4-9, 4-10
  movers, 9-3, 10-8, 10-20, 10-22, 10-23–10-24,                   geographic sort variables, 4-8, 4-11
     11-13, 11-22, 11-23, 12-14, 12-23–12-28                      identifying records with no item nonresponse, 4-8
  newborns, 10-25                                                 longitudinal, 4-8, 4-9, 4-10
  parents and spouses, 12-22                                      overview, 1-5, 4-8–4-11
  program participation, 12-29                                    preprocessing sample file, 4-11–4-12
  purpose, 9-2–9-3, 9-4, 10-8, 11-13, 11-14, 12-14                redesign, 4-5, 4-7
  secondary sample persons, 9-3                                   selecting replacement values, 4-8, 4-13
  sorting files for linking, 13-3, 13-4, 13-7, 13-9,              steps, 4-8, 4-11–4-14
     13-14, 13-15                                                 topical module data, 4-5, 4-14
                                                                  types, 4-8–4-9


                                                       Index-17
SIPP USERS’ GUIDE

  updating hot-deck values, 4-13                              income topcoding, 11-28
Severence pay, 3-3, 3-5                                       nonresponse, 6-4
Shelter. See Housing                                          oversampling, 8-2
                                                              poverty status, 2-8–2-9
Simple random sample (SRS), 1-7, 2-5, 7-1                     PSID coverage, 1-11
Single parents, 8-19, C-22–C-25                               undercoverage, 1-6, 6-1, 6-4, C-17
Social Security, 3-3, 6-4, 9-7, 9-14, 10-27, 10-28,           weighting, 8-2, C-1, C-8–C-9
  10-29, 10-30–10-31, 10-36, 12-29, B-5                     Subsampling, address, 2-6, C-2
Sorting operations, 4-11                                    Supplemental Security Income (SSI)
Source and accuracy statement, 5-14, 7-4, 7-5,                program, 6-4, 9-14
  10-2, 10-37, 11-2, 11-29, 12-2, 12-38, 13-21,               definition of qualifiying disabling conditions,
  E-11                                                           10-28, 12-30
Special places. See Group quarters frame                      federal/state administration, 10-28
Spell durations, 6-4                                          history, 3-15
                                                              income variables, 12-30, 12-34–12-36
Spell estimations, 6-4, 8-18–8-19, 12-7, 13-20                program units, coverage, and recipiency, 10-29,
Spouses, 8-10, 10-15, 10-17, 10-19, 11-12, 11-13,                10-30–10-31, 12-29, 12-30, 12-31
  11-16, 11-19, 11-20, 11-21, 11-22, 12-13, 12-21,            user-created monthly variables, 12-30, 12-34–
  12-22, C-3, C-6, C-10, C-11, C-12, C-20, C-22–                 12-36
  C-25                                                        variables describing participation, 10-27, 10-28,
Standard errors                                                  12-29
  bias in estimates of, 2-5, 13-21                            variance functions, 7-4
  computation of, 5-14, 10-1, 10-2, 11-1, 11-2, 12-2,       Supplemental unemployment benefits, 3-5
     13-21                                                  Support. See also Child support
  of estimated numbers, 7-4–7-5                                 nonhousehold members, 3-14
  of mean, 7-5–7-6
  overlapping panel structure and, 2-2
                                                            Survey of Program Dynamics (SPD), 1-10,
  tables of, 7-4                                                1-11, 2-2, E-11
Standard of living, 3-8, 3-10                               Surveys-on-Call, 1-6, 5-12–5-13, E-11
State identification, 4-17–4-18, 9-15, 10-38,               Survival analysis, 8-18
  11-11, 11-12, 11-29, 12-38                                Survivors’ income, 3-3
State-level estimates, 10-38, 11-29, 12-38                  Systematic bias, 6-3
State variable, 9-15, 10-38, 11-11, 11-29, 12-38
Subfamily(ies)                                              Tax returns, 1-10, 3-14
  analyzing people in, 10-12                                Taxes
  defined, 8-11, 10-11, 12-17                                   income, 3-8, 3-13, 3-14
  as distinct family unit, 10-12, 12-19                         property, 3-13
  edited relationships, 10-15                               Taylor-series approximation, 7-2
  excluding for analysis purposes, 10-12, 10-13–            Technical documentation
     10-14, 10-15, 11-17, 12-19, 12-20                          core wave files, 10-2–10-4
  ID variables, 10-11–10-14, 10-21, 11-17, 12-18,               defined, E-11
     12-20, 12-23                                               description of, 1-14, 5-12, 5-14
  including with primary family, 10-13–10-14,                   full panel files, 12-2–12-5, 12-9
     10-21, 12-19, 12-20                                        instrument screens and program code, 10-2, 11-2
  income variables, 10-19–10-20, 10-21, 12-23                   source, 3-1
  number in household, 10-15, 10-21, 11-17                      topical module files, 3-7, 11-2–11-5
  related, 3-11, 8-4–8-5, 8-11–8-12, 8-13, 9-7, 9-12,
     10-11, 10-13–10-14, 10-15, 10-19–10-20,
                                                            Telephone interviews/interviewing
     10-21, 11-16, 11-17, 12-17, 12-20, 12-23, E-9            callbacks, 2-17, 2-21
  type, 10-13–10-14                                           movers, 2-15, C-15
  unrelated, 3-11, 8-11, 9-6, 9-7, 10-11, 10-12,              procedures, 2-17
     11-16, 12-17, 12-19, 12-20, E-13                         quality of data, 6-2
  weights, 8-4–8-5, 8-6, 8-8, 8-11–8-12, 8-13                 security/confidentiality of, 2-17
Subpopulations. See also Race/ethnicity                     Telephone numbers, 5-16


                                                     Index-18
                                                                                                     INDEX

Temporary Assistance for Needy Families                     ID variables, 9-3, 9-6, 11-7, 11-11–11-27, 13-11,
  (TANF), 1-3, 3-5, 3-15, 9-7, 9-14, 10-27, 10-30               13-14, 13-15, 13-23
                                                            imputed data, 4-14, 9-15, 11-11
Time-in-sample bias, 1-7, 2-2, 6-3, 8-19, E-12              linking family members, 11-13
Topcoding                                                   linking two or more, 13-1, 13-11–13-12
  adjustments for inflation and real growth, 10-32,         linking with core wave files, 1-9, 13-12–13-14
     10-34, B-1                                             linking with full panel files, 1-9, 13-14–13-15
  age, 4-17, B-4–B-5                                        merging two or more, 11-13
  algorithms, 10-33–10-34                                   merging with core wave files, 1-8, 3-10, 9-6, 9-9,
  computations, B-1, B-2–B-3                                    10-6, 11-1, 11-7, 11-8, 11-10, 11-11, 11-13,
  core wave files, 9-15, 10-6, 10-29, 10-32–10-36,              11-17, 11-19, 12-6, 12-13, 13-1, 13-3, 13-4,
     11-28                                                      13-12, 13-13, 13-14, 13-15
  creating means for, B-3–B-4                               merging with full panel files, 9-6, 10-6, 11-1,
  defined, E-12                                                 11-7, 11-13, 11-19, 12-1, 12-6
  earned income, 10-32–10-35, B-1–B-4, B-7                  metropolitan area identification, 11-29
  examples, 10-34–10-35, B-2                                monthly interview status variable, 9-4, 9-5, 9-11,
  full panel files, 9-15, 12-31, 12-36–12-37                    11-9–11-11
  gender and, 10-32, 10-33, B-2, B-4                        mover identification, 11-13, 11-14, 11-21–11-27,
  income, 4-17, 9-15, 10-29, 10-32–10-36, 11-28,                13-23
     12-31, 12-36–12-37, B-1–B-4, B-6–B-7                   overview, 1-8
  internal files, 5-2                                       person identification, 9-11, 9-15, 11-11, 11-13–
  labor force status and, 10-32, 10-33, B-3, B-4                11-15, 13-23
  matrix, B-1, B-2–B-3                                      pre-1996, 11-9–11-11
  1996 Panel, 10-29, 10-32–10-35, 12-31, B-1–B-2            public use version, 9-2, 9-3, 11-1–11-29
  pre-1996, 10-35–10-36, 12-31                              questionnaire correspondence to, 11-6
  purpose, 10-29, 11-27–11-28, 12-31                        redesign of 1996, 3-9–3-10, 5-4, 9-5, 11-6, 11-8,
  property-related, 11-28, B-6                                  11-9, 11-11, 11-17, 11-29
  race and, 10-32, 10-33, B-2–B-3, B-4                      state identification, 9-15, 11-11, 11-29
  specifications, B-1–B-7                                   structure, 5-4, 5-11, 9-2, 9-11, 11-7–11-8, 13-11,
  topical module files, 9-15, 11-27–11-28                       13-13
  unearned income, 10-29, 10-32, 11-28, B-6–B-7             technical documentation, 11-2–11-5
  universe of cases, 11-28                                  topcoding, 9-15, 11-27–11-28
  variables required, B-1, B-6–B-7                          variable names, 9-3, 9-15, 11-1, 11-6, 11-11–
  worker characteristics and, 10-32                             11-12, 11-13, 13-11
Topical content, 3-1, 3-6–3-7, E-12                         weights, 8-3, 8-16, 9-8, 9-15, 11-1, 11-2, 11-28–
Topical data, for skipped rotation groups, 2-3                  11-29, 13-12, 13-22
Topical items, 3-1                                        Topical modules, 1-4
Topical module files                                        categories, 3-7
  allocation flags, 11-28                                   core data merged with, 1-8, 3-10, 9-9, 11-8, 11-10
  content, 1-4–1-5, 1-8, 5-4–5-11, 11-7, 11-10              data editing, 4-4, 13-12
  core wave files compared, 9-11–9-15, 11-7, 11-8,          defined, 3-1, 3-6
     11-11–11-12, 13-13                                     frequency and timing, 3-6
  creation, 4-5                                             “history” modules, 3-9, 3-15, 11-8
  data dictionary, 9-11, 11-2–11-5, 11-6, 12-3              household member relationships, 9-6, 11-11,
  defined, E-12                                                 11-19
  family composition variables, 9-6, 9-12, 9-13,            imputation procedures, 4-2, 4-5, 4-14, 9-15, 11-11,
     9-15, 11-16–11-18, 11-19–11-21, 11-22                      13-12, E-12
  full panel files compared, 9-11–9-15, 11-8                missing data, 4-5, 5-4
  full panel files linked with, 1-9, 9-6, 11-1, 11-7,       by panel and wave, 3-7, 3-8–3-16, 5-4, 5-6–5-11,
     11-8, 11-13, 12-1, 12-6, 13-14–13-15                       11-6
  household composition variables, 9-12, 9-13,              purpose of, 3-6
     11-16, 11-19–11-21, 11-22                              reference period for, 3-7, 11-8, 11-10, 11-11,
  household identification, 9-11, 9-15, 11-11, 11-14,           11-19, 11-21, 13-13
     11-15–11-16                                            respondents, 3-7, 11-6, 11-10
                                                            sample definitions, 11-8
                                                            title-content relationship, 3-7


                                                   Index-19
SIPP USERS’ GUIDE

  topics, 3-6, 3-7, 3-8–3-16, 5-6–5-11                         name changes, 8-1, 9-1, 9-3, 9-15, 10-1, 10-6,
Transfer programs, 9-7. See also Program                          11-1, 11-11, 13-1, 13-2, 13-11, A-1–A-34. See
  participation; Program units; individual                        also ID variables
                                                               name–content correspondence, 10-6, 11-6, 12-5
  programs                                                     number of occurrences, 12-3, 12-6
                                                               previous wave, 11-27, 13-23
Undercoverage, 1-6, 6-1, 6-4, C-17, E-13                       program income, 9-14, 10-27, 12-30, 12-31,
Unemployment                                                      12-32–12-36, 12-37
  compensation, 3-3, 3-5, 6-4                                  program participation, 9-14, 10-27, 12-29, 12-31–
  CPS computations, 1-9                                           12-36
  length of, 3-15                                              questionnaire item correspondence, 10-4–10-5,
  insurance, 3-3                                                  11-6, 12-5–12-6
  P-70 publications, 5-2                                       reference month weights, 8-8–8-9
  reasons for, 3-8, 3-13, 3-15                                 reference person, 10-16
  spell duration, 8-18, 13-20                                  rotation group, 11-10, 11-11, 11-12
Unit frame, 2-6                                                subfamily, 8-11
                                                               summary, 5-15, 10-29, 10-35–10-36
University of Michigan, 1-10                                   for topcoding, B-1, B-6–B-7
U.S. Government Printing Office, 5-1                           topical module files, 8-16, 9-13, 11-4, 11-6,
User Notes, 5-12, 5-14, 10-2, 11-2, E-13                          11-11–11-12, 11-13–11-15
Uses of SIPP, 1-3–1-4                                          unearned income, 12-30, 12-32–12-36
Usual place of residence, E-14                                 values, 10-5, 10-12, 11-4, 11-9
                                                               variance estimation, 7-3
                                                               weight, 9-15
Variable metadata, 5-15, E-14                              Variance estimation. See also Generalized
Variables. See also ID variables                             variance functions (GVFs)
  auxiliary, 4-11, 4-12                                      approximation methods, 7-4–7-6
  construction of, 9-8                                       core wave files, 7-3
  content, 5-15                                              degrees of freedom, 7-2
  core wave files, 9-1, 9-13, 10-1, 10-4, 10-8, 10-11,       direct methods, 7-1–7-3
      11-11–11-12, 13-9, A-1–A-34                            Fay’s formula, 7-3
  covariances among, 4-11, 4-13                              imputation and, 4-3, 4-11, 4-12, 4-16, 7-6
  crosswalk of 1993 and 1996 names, A-1–A-34                 1990–1993 panels, 7-2–7-3
  dash characters in names, 13-9                             1996 panel, 7-3
  description of, 10-2, 11-2; see also Data dictionary       OASDI, 7-4
  differences by file type, 9-10, 9-11–9-15                  replication methods, 7-2, 7-3
  duplicate names for different variables, 13-11             sample design and, 1-7, 7-1
  family composition, 9-13, 10-15–10-20, 11-16–              software, 7-2, 7-3, 7-5
      11-18, 11-19–11-21, 11-22, 12-21–12-22                 SRS formulas, 7-1
  family identification, 8-11, 10-11–10-14, 12-17–           SSI, 7-4
      12-18                                                  strata, 7-1, 7-2–7-3
  family-level income, 10-19–10-20, 10-21, 12-23             units, 7-2–7-3
  file position, 1993 and 1996, A-18–A-34                    variables, 7-3
  full panel files, 1-8, 8-16–8-17, 9-13, 12-5, 13-9
  geographic sort, 4-11
                                                           Vehicle ownership, 3-8, 3-12
  household composition, 4-16, 8-10, 9-11, 9-12,           Veteran’s benefits, 10-27, 12-29
      9-13, 9-15, 10-8, 10-10, 10-15–10-20, 10-23–         Veterans Compensation and Pensions, 6-4,
      10-24, 11-19–11-21, 11-22, 12-21–12-22                   9-7, 9-14
  household identification, 10-10                          VPLX software, 7-3
  imputed, 4-7, 4-11, 4-16, 12-37
  in-sample, 11-9, 12-9, E-5
  interview month weights, 8-9, 8-10
                                                           Wages and salaries. See also Earnings
  length of names, 13-4                                        gross pay, 4-9–4-10
  merging from other files, 11-11, 11-19, 13-4                 imputation, 4-7, 4-9
  monthly, 9-3–9-4, 9-8; see also Monthly interview            reservation wage, 3-13
      status variable                                          topcoded, 10-32–10-36, 12-37


                                                    Index-20
                                                                                                             INDEX

Waves. See also Missing waves                                     population control adjustments, 1-6, 6-1, 6-4, 8-6,
  attrition rates by, 2-19                                            C-3–C-4
  bounded, 8-7                                                    pooled data from multiple panels, 8-19–8-21
  combining, 8-14–8-16                                            pre-1996 factors, C-1, C-12
  comparability of responses among, 8-19                          quarterly estimates, 8-15–8-16
  defined, 1-2, 2-1, 2-3, E-14                                    raking, 8-5, C-4, C-5, C-8, C-9, C-10, C-12,
  interviewing mode by, 6-2                                           C-23, C-24, C-25
  nonresponse by, 2-17–2-18, 2-19, 7-6                            ratio adjustments, C-4, C-5, C-8, C-9, C-10,
  number of, 1-3, 2-2, 2-3, 12-6, 12-7                                C-11, C-12, C-23, C-24, C-25
  organizing principles, 2-3                                      rotation group inflation, 8-14
  overlapping, 8-19, 8-21, 9-9                                    sample cut factor, C-13
  person identification by, 10-8–10-9, 11-14, 12-14               second-stage calibration adjustments
  short, 2-2, E-11                                                    (post-stratification), 8-4, 8-5, 8-6, 8-8, 13-21,
  size of sample, 1-2, 2-2                                            C-1, C-3–C-12, C-13, C-16–C-17, C-20–C-25
  topical modules by, 3-7, 3-8–3-16, 5-6–5-11                     spell estimations, 8-18–8-19
  variable name, 11-12                                            subsampling of housing unit clusters, 8-4, 8-5
Web sites                                                         topical module files, 8-16, 11-28–11-29
  Census Bureau, 1-6, 5-12                                        Wave 1, 8-5, 8-9, 8-10, 8-14, C-1–C-12, C-13,
  SIPP, 1-6, 1-13, 4-1, 5-1, 5-12, 5-13, 5-14, 5-15,                  C-14
     10-2, 11-2, 12-2                                             Wave 2+, 8-5–8-6, 8-8, C-12–C-17
  variance estimation software, 7-2                           Weights. See also Reference month weights;
Weighting procedures                                           Interview month weights; Person weights
  attrition adjustments, 8-4, 8-19, 13-22                         additional household members, 8-5, 8-7, 8-17, 9-5,
  calendar month estimation, 8-12, 8-14–8-15, 8-19,                   9-8
      9-8, 12-7, 13-1, 13-8                                       age-related, 8-5, C-3–C-4
  calendar year estimates, 8-3, 8-7–8-8, 8-16–8-17,               base, 8-4, 8-5, C-1–C-2, C-12, C-14
      8-18, 9-5, 9-8, 12-37–12-38, 13-21, C-17–C-25               choosing, 8-3–8-4, 9-8, 10-37, 13-12
  cell collapsing, C-2–C-3, C-4, C-5–C-6, C-8,                    components, 8-4
      C-16, C-19, C-23                                            construction of, 8-4–8-8
  children, 8-17, C-4, C-7, C-10, C-19, C-24–C-25                 core wave files, 5-4, 8-3, 8-4–8-5, 8-7, 8-8–8-13,
  control-total computation, C-4, C-8–C-9, C-16–                      9-8, 9-15, 10-1, 10-2, 13-8, 13-22, C-1–C-25
      C-17, C-20, C-23, C-25                                      cross-sectional, 5-4, 8-4, 8-7, 11-28, C-12–C-13,
  core wave files, 5-4, 8-8–8-16, 10-37                               C-17
  duplication control factor, 8-4, 8-5, 13-23, C-1,               defined, 8-1–8-2, E-14
      C-2                                                         effects on estimates, 1-6, 8-2
  first-stage ratio estimate factor, C-1, C-12, C-13              exiting sample members, 13-17, 13-19–13-20
  full panel files, 8-16–8-19, 12-1, 12-37–12-38,                 family, 8-4–8-5, 8-6, 8-8, 8-11–8-12, 8-13, 9-15
      13-22                                                       final, C-1
  household noninterview adjustment factor, C-1,                  full panel files, 8-3, 8-7–8-8, 8-16–8-19, 9-15,
      C-2–C-3, C-15                                                   12-1, 12-2, 12-13, 12-37–12-38, 13-14, 13-22,
  imputation adjustments, 8-4, 8-5                                    C-1–C-25
  information resources, 5-16                                     household, 8-2, 8-4–8-5, 8-6, 8-8, 8-10–8-12,
  later wave noninterview adjustments, C-12–C-13,                     8-13, 8-18, 9-5, 9-8, 9-15, C-2–C-3
      C-15–C-16, C-17                                             initial, 8-6, 8-7, C-12, C-13, C-15, C-17, C-18
  missing waves, 8-7, 13-22                                       longitudinal, 8-3, 8-4
  mover adjustment, 8-4, 8-5, 8-6, 13-20, C-13–                   merging, 5-4, 13-1, 13-12
      C-15, C-16, C-19                                            monthly cross-sectional, 5-4, 8-4
  new construction noninterview adjustment factor,                number per person record, 8-8
      C-1, C-12, C-13                                             panel, 8-16–8-17, 8-18–8-19
  noninterview adjustment factors, C-1, C-2–C-3,                  positive, 12-13
      C-12, C-13, C-18–C-19                                       program participation, 9-5, 12-13
  nonresponse adjustment factors, 2-17, 2-18, 4-1,                purpose, 8-1–8-2
      6-2, 6-4, 8-4, 8-5, 8-6, 8-8, C-3                           redesign of SIPP and, 7-3, 8-1, 8-3, 8-5, 8-6, 8-9,
  overview, 1-7                                                       8-16, 12-37, C-1, C-2–C-3
  panel, C-17–C-25                                                reference person, 8-6, 8-10, 8-11


                                                       Index-21
SIPP USERS’ GUIDE

  replication, 7-3                                          WIC program, 4-16, 9-7
  rotation group, 8-5, 8-8, 8-12, 8-14, 8-16                 authorized recipient, 10-28
  source and accuracy statements, 5-14, 10-2, 11-2,          ID variables, 9-14, 10-27, 10-28, 12-29, 12-30,
     11-28, 12-2, 12-38                                         12-31
  subfamily, 8-4, 8-6, 8-8, 8-11–8-12, 8-13, 9-15,           imputed coverage, 10-28, 12-28
     10-37                                                   infant population, 8-17
  topical module files, 8-3, 8-16, 9-15, 11-2, 13-12,        program units, coverage, and recipiency, 10-29,
     13-22                                                      10-30–10-31, 12-28, 12-29, 12-30, 12-31
  uses, 8-8–8-21, 9-8                                        unit totals, 10-29
  variable names by file type, 9-15                         Wide-record format, 13-2, 13-6, 13-7, 13-9
  zero, 9-5, 9-8, 12-13, C-19
                                                            Women, 5-16
Welfare. See also Program participation
  history, 3-15
                                                            Work. See also Employment; Labor force
  reform, 1-3, 2-2–2-3, 3-3, 3-7, 3-15, 5-11, 9-7,           status
     10-27                                                   disability, 3-11, 3-12, 3-15
Well-being                                                   expenses related to, 3-15
  adult, 3-8, 5-16, 11-21                                    history, 3-9, 3-15, 5-2
  children, 3-7, 3-9, 5-16, 11-21                            at home, 3-6, 3-16
  extended measures of, 3-8, 3-10, 5-2, 5-3                  moonlighting, 3-3
  information resources, 5-2, 5-3, 5-16                      part-time, 4-8
  topical modules, 3-7, 3-8, 11-21                           schedule, 3-4, 3-7, 3-16
                                                             time spent looking for, 3-3
What’s Available from the Survey of Income
                                                            Working papers, 1-13, 5-13, 5-14, 5-15
 and Program Participation, 5-15


                                                     Index-22