* Note: all.sas and test.sas are not available. ;
/*
CPS Guide
This documentation provides an explanation of several programs written to
adapt the March CPS extract data prepared by Mare-Winship. These
programs are used for such applications as matching couple data
within the same year as well as matching household and individual data for
two consecutive years.
MATCHING HUSBANDS AND WIVES
The all.sas program organizes and merges the Mare-Winship CPS data to
create a new data set that matches husband and wife data. Only married
individuals, determined by a value of 1, 2 or 3 for variable x60- Marital
Status, are included in the new data set. Thi data set containing
records for married individuals are then separated into two new data sets
by variable x76- Sex, 1 for males and 2 for females. Each of the
personal variables for all the records in both data sets are renamed (i.e
variable x33- Age is renamed m33 for each male record and f33 for each
female record). The two data sets are then merged by variables x1-
Household Number and either x2 or x12, both variables measuring the
Family Number Within a Household, to create one data set with matching
husband and wife records. (The use of x2 or x12 depends upon the year of
the data set. Both variables have missing values for certain years, but
together the two represent the entire time series.) Each new couple
record contains one value for each of the household and family variables
and two values, one for the male data and one for the female data, for
each of the personal variables.
The test.sas program provides a sample application of the all.sas matched
couple data set. This program runs a PROC FREQ on the couple records
using variable x63- Normal Full Time Work. The resulting table shows all
the working/non-working combinations among couples and the corresponding
percentages of each situation for the given year.
MARCH-MARCH HOUSEHOLD MATCHES
The match.sas program matches March CPS data by household for any two
consecutive years. Each of the household variables for all the records
in year one are renamed by replacing x- with y1- (i.e. x1- Household
Counter becomes y11), with a similar replace of x- with y2- in year two.
The two data sets are merged by variables x7- Random Cluster Code and
x10- Family Serial Number to create a new data set containing matched
household records for two consecutive years (match only valid from 1968
to 1975 due to missing data for the merge variables during the rest of
the time series).
MARCH-MARCH INDIVIDUAL MATCHES
The IndMatch.sas program matches individual data for heads of households and
spouses of heads, determined by a value of 1 or 2 for variable x53- Household
Recode III, by merging two data sets, representing two consecutive years,
by variable x7- Random Cluster Code, x10- Family Serial Number, x33- Age,
x72- Race, and x76- Sex (match valid from 1968-1975 due to missing
data). The personal variables were renamed to represent year one and
year two in the same manner as in creating the match.sas data set.
QUALITY OF MATCH TESTS
Several tests were run to determine the quality of the match of the
individual records. The following variables were chosen as possible
indicators of a poor match of individual records. Each of the chosen
variables has an expected value from one year to the next. An unexpected
change in year two in the values of any of these variables would be a
sign of a possible poor match among the individual records.
Code Variable Name Expected Value, Year 2
---- ------------- ----------------------
x33 Age Increase of one year
x38 Current Industry No change or a logical move
to a similar industry
x39 Current Occupation No change or a logical move
to a similar occupation
x49 High Grade Attended No change or an increase of
one higher grade attended
x79 Veteran Status No change, or a logical
change in status
PROC FREQ tables were generated for combinations of these chosen
variables. The resulting tables were analyzed, with the results
reported below, to determine if the results corresponded to hypothesized
results of the the tests. The quality of match tests were done for the
time series 1969-1970.
In looking at the individual PROC FREQ tables for the variables x38- Current
Industry and x39- Current Occupation, nearly three-quarters of the individuals
in the data set maintained the same occupation or industry from 1969 to 1970.
The PROC FREQ table that compares the change among industry and occupation for
individuals of the expected age in 1970 (age1969+1) also shows a positive
indication of a good match with nearly 69% of the individuals remaining in the
same industry and having the same occupation for the two years. It is
important to note, however, that a change among occupation and/or
industry is not necessarily an indication of a bad match. Some changes
are logical moves to a similar occupation and/or industry. Each
applicable record would have to be analyzed in order to determine a the
possibility of a poor match.
Only a small number of the matched individual records reported a change
in Veteran Status from 1969 to 1970, with 26601 of the 26847 total
matched records maintaining the same status. This result is plausible
when considering that the Vietnam War falls within the time period in
question. Again, each applicable record would have to be analyzed to
determine a logical change in Veteran Status. The change among the year
of the highest grade attended also was small, with 25983 of the 26847
total matched records reported the same level of education for the two years.
When matching individuals by age, the IndMatch.sas program allowed for a
match of plus-or-minus one year from the age that was expected in the
second year. Over 94% of the matched individuals in this time period
reported an age in the second year of a year older than the age reported
in the previous year. When the age variable was run in the PROC FREQ
tables with the other bad indicator variables, the highest percentages
from the possible combinations always were reported for the situation
representing the expected age (age1970=age1969+1) and no change among the
other chosen variable.
In an attempt to explain the any change within the make-up of families or
households, the Indiv.sas program was written to report the combinations
of ages reported by matched couple data for two consecutive years. The
table shows that of all the matched couples, 57 percent reported an
increase of one year in age from 1969 to 1970 for both the head and the
spouse. 21 percent of the couples both reported an age outside of the range
allowed for a match, which can be explained by new families within a
household in 1970. The remaining 22 percent is made up of various combinations
of one member of the household being plus-or-minus one year of age from the
expected age in 1970 compared to the same for the other spouse with a small
combination of the results showing one spouse within the allowed range of age
matches and the other spouse outside the range, probably representing
remarrigaes.
The results of the quality of match tests indicate that the matches of
records where individuals reported the expected age (year2=year1+1)
produce the best results. Only a small percentage of the total number of
matched records report an age in the second year equal to that of the
first year or two years greater than the age in the first year. The
results show that, of these individuals, a larger portion of the total
has unexpected changes among the other variables (i.e. industry,
occupation, industry). These results suggest that the matches involving
individuals with reported ages in year two other than what is expected
have a higher likelihood of being a poor match.
MISSING DATA
A PROC MEANS was done in all records for each year in the Match CPS
extract files to determine any years which a variable may be missing
data:
Code Variable Name Years Missing
---- ------------- -------------
x2 Family-in-Household 83, 87, 88
x4 Year 90
x7 Random Cluster 77-92
x8 Keyfitz Cluster 77-92
x9 Noninterview Cluster 64-67, 77-92
x10 Family Serial Number 64-67, 80-88
x11 Family Description 89-92
x12 Family Position in Household 68-75
x13 Family Type C-recipiency 64, 65, 89-92
x15 Number of Persons in Family 89-92
x20 Household Serial/Segment Number 64-67, 89-92
x21 Household Type 64-67
x22 Household Status 64-76
x23 Number of Families in Household 64-67
x26 SMSA 89-92
x27 SMSA-I 89-92
x32 ADC Recipiency 64, 65
x34 Alimony Recipiency 64-68
x35 Any Reason Could Not Take Job 64-67
x37 Complete High Grade Attended 92
x42 Family (Secondary) Membership 89-92
x43 Family Number 64-67, 89-92
x46 Farm/Self-Employed Income 76-79
x54 Last Work Full Time 64-67
x55 Last Work Full Time For Pay 64-67
x57 Look For Full or Part Time Work 64-67
x61 Nonfarm Self-Employment Income 76-79
x62 Normal Full Time Job 64-67, 89-92
x65 Parents Presence 64-75
x66 Person Sequence Number 64-67
x70 Public Assistance Amount 64-67
x71 Public Assistance Recipiency 64, 65
x73 Reason Not At Work Last Week 64-67
x77 Subfamily Membership Key 64-67, 89-92
x78 Unemployment Recipiency 64-68
x82 Weeks Looking for Work Last Year64-75, 89-92
x83 Weeks Looking for Work Last Year64-67
x84 Weeks Looking/Layed Off Work 64-75
x86 Weeks Worked Last Year-I 64-75
x87 Weeks Worked Last Year-II 89-92
x89 Why Look For Work 64-66
x91 Person Serial Number 64-67, 80-92
x92 Poverty Cutoff Dollars 64-67
x93 Poverty Level 64-67
x95 Spanish Ethnicity 64-70
x97 Main Reason For Part-Year Work 64-67, 89-92
x99 Stretches of Unemployment 64-67, 76-79
x100 Weeks in Labor Force 76-92
x101 Family A Weight 64-67, 77-92
x102 Family P Weight 64-67, 77-92
x103 Family Weight Basic 76
x104 Household Weight 64-76
x105 Person A Weight 64-67, 76-92
x106 Person P Weight 64-67, 76-92
x108 Basic CPS Weight 68-88
x109 Type-A-Income 64-67, 80-92
x110 Type-B-Income 64-67, 89-92
x111 Type-C-Income 64-67, 89-92
x112 Type-D-Income 64-67, 89-92
x113 Type-E-Income 64-67, 89-92
x114 Dividends and Interest 64-75
x115 Rental Income 64-75
x116 Public-Assistance Income 64-75
x117 Supplemental Security Income 64-75
x118 CPI-Index 89-92
x119 Version Number Major I.D. 64-92
x120 Version Number Minor I.D. 64-67, 80-92
x121 Presence of Own Children 68-92
x122 Own Chilren Under 6 (in family) 68-79
x123 Own Children Under 18 68-79
x124 Related Children Under 18 68-92
x125 Family Members Under 18 68-79
x126 Family Members Over 18 68-92
x127 Female Family Members 18+ 68-92
x128 Labor Force Status 68-92
x129 Household Flag 68-92 */