Index of /mare_winship/sources

* Note:  all.sas and test.sas are not available. ;
/*
 				CPS Guide			    
  							    
This documentation provides an explanation of several programs written to 
adapt the March CPS extract data prepared by Mare-Winship.  These 
programs are used for such applications as matching couple data 
within the same year as well as matching household and individual data for 
two consecutive years.

MATCHING HUSBANDS AND WIVES			

The all.sas program organizes and merges the Mare-Winship CPS data to 
create a new data set that matches husband and wife data.  Only married 
individuals, determined by a value of 1, 2 or 3 for variable x60- Marital 
Status, are included in the new data set.  Thi data set containing 
records for married individuals are then separated into two new data sets 
by variable x76- Sex, 1 for males and 2 for females.  Each of the 
personal variables for all the records in both data sets are renamed (i.e 
variable x33- Age is renamed m33 for each male record and f33 for each 
female record).  The two data sets are then merged by variables x1- 
Household Number and either x2 or x12, both variables measuring the 
Family Number Within a Household, to create one data set with matching 
husband and wife records.  (The use of x2 or x12 depends upon the year of 
the data set.  Both variables have missing values for certain years, but 
together the two represent the entire time series.)  Each new couple 
record contains one value for each of the household and family variables 
and two values, one for the male data and one for the female data, for 
each of the personal variables.
  								    
The test.sas program provides a sample application of the all.sas matched 
couple data set.  This program runs a PROC FREQ on the couple records 
using variable x63- Normal Full Time Work.  The resulting table shows all 
the working/non-working combinations among couples and the corresponding 
percentages of each situation for the given year.

MARCH-MARCH HOUSEHOLD MATCHES 						    

The match.sas program matches March CPS data by household for any two 
consecutive years.  Each of the household variables for all the records 
in year one are renamed by replacing x- with y1- (i.e. x1- Household 
Counter becomes y11), with a similar replace of x- with y2- in year two.  
The two data sets are merged by variables x7- Random Cluster Code and 
x10- Family Serial Number to create a new data set containing matched 
household records for two consecutive years (match only valid from 1968 
to 1975 due to missing data for the merge variables during the rest of 
the time series).  

MARCH-MARCH INDIVIDUAL MATCHES  					    

The IndMatch.sas program matches individual data for heads of households and 
spouses of heads, determined by a value of 1 or 2 for variable x53- Household 
Recode III, by merging two data sets, representing two consecutive years, 
by variable x7- Random Cluster Code, x10- Family Serial Number, x33- Age, 
x72- Race, and x76- Sex (match valid from 1968-1975 due to missing 
data).  The personal variables were renamed to represent year one and 
year two in the same manner as in creating the match.sas data set.

QUALITY OF MATCH TESTS

Several tests were run to determine the quality of the match of the 
individual records.  The following variables were chosen as possible 
indicators of a poor match of individual records.  Each of the chosen 
variables has an expected value from one year to the next.  An unexpected 
change in year two in the values of any of these variables would be a 
sign of a possible poor match among the individual records.
 								    
  	Code	Variable Name		Expected Value, Year 2 	    
  	----	-------------		----------------------	    
 	x33	Age			Increase of one year	    
 	x38	Current Industry	No change or a logical move 
 					  to a similar industry	    
 	x39	Current Occupation	No change or a logical move 
  					  to a similar occupation   
 	x49	High Grade Attended	No change or an increase of 
 					  one higher grade attended 
 	x79	Veteran Status		No change, or a logical     
  					  change in status	      

PROC FREQ tables were generated for combinations of these chosen 
variables.  The resulting tables were analyzed, with the results 
reported below, to determine if the results corresponded to hypothesized 
results of the the tests.  The quality of match tests were done for the 
time series 1969-1970.

In looking at the individual PROC FREQ tables for the variables x38- Current 
Industry and x39- Current Occupation, nearly three-quarters of the individuals 
in the data set maintained the same occupation or industry from 1969 to 1970.  
The PROC FREQ table that compares the change among industry and occupation for 
individuals of the expected age in 1970 (age1969+1) also shows a positive 
indication of a good match with nearly 69% of the individuals remaining in the 
same industry and having the same occupation for the two years.  It is 
important to note, however, that a change among occupation and/or 
industry is not necessarily an indication of a bad match.  Some changes 
are logical moves to a similar occupation and/or industry.  Each 
applicable record would have to be analyzed in order to determine a the 
possibility of a poor match.

Only a small number of the matched individual records reported a change 
in Veteran Status from 1969 to 1970, with 26601 of the 26847 total 
matched records maintaining the same status.  This result is plausible 
when considering that the Vietnam War falls within the time period in 
question.  Again, each applicable record would have to be analyzed to 
determine a logical change in Veteran Status.  The change among the year 
of the highest grade attended also was small, with 25983 of the 26847 
total matched records reported the same level of education for the two years.

When matching individuals by age, the IndMatch.sas program allowed for a 
match of plus-or-minus one year from the age that was expected in the 
second year.  Over 94% of the matched individuals in this time period 
reported an age in the second year of a year older than the age reported 
in the previous year.  When the age variable was run in the PROC FREQ 
tables with the other bad indicator variables, the highest percentages 
from the possible combinations always were reported for the situation 
representing the expected age (age1970=age1969+1) and no change among the 
other chosen variable.

In an attempt to explain the any change within the make-up of families or 
households, the Indiv.sas program was written to report the combinations 
of ages reported by matched couple data for two consecutive years.  The 
table shows that of all the matched couples, 57 percent reported an 
increase of one year in age from 1969 to 1970 for both the head and the 
spouse.  21 percent of the couples both reported an age outside of the range 
allowed for a match, which can be explained by new families within a 
household in 1970. The remaining 22 percent is made up of various combinations 
of one member of the household being plus-or-minus one year of age from the 
expected age in 1970 compared to the same for the other spouse with a small 
combination of the results showing one spouse within the allowed range of age 
matches and the other spouse outside the range, probably representing 
remarrigaes.  

The results of the quality of match tests indicate that the matches of 
records where individuals reported the expected age (year2=year1+1) 
produce the best results.  Only a small percentage of the total number of 
matched records report an age in the second year equal to that of the 
first year or two years greater than the age in the first year.  The 
results show that, of these individuals, a larger portion of the total 
has unexpected changes among the other variables (i.e. industry, 
occupation, industry).  These results suggest that the matches involving 
individuals with reported ages in year two other than what is expected 
have a higher likelihood of being a poor match.
 								    
MISSING DATA
 								    
A PROC MEANS was done in all records for each year in the Match CPS 
extract files to determine any years which a variable may be missing 
data:
	    
 	Code	Variable Name			Years Missing	    
  	----	-------------			-------------	    	
  	x2	Family-in-Household		83, 87, 88	    
  	x4	Year				90		    
 	x7	Random Cluster			77-92		    
  	x8	Keyfitz Cluster			77-92		    
 	x9	Noninterview Cluster		64-67, 77-92	    
  	x10	Family Serial Number		64-67, 80-88	    
 	x11	Family Description		89-92		    
 	x12	Family Position in Household	68-75		    
  	x13	Family Type C-recipiency	64, 65, 89-92	    
 	x15	Number of Persons in Family	89-92		    
 	x20	Household Serial/Segment Number	64-67, 89-92	    
  	x21	Household Type			64-67		    
 	x22	Household Status		64-76		    
  	x23	Number of Families in Household	64-67		    
 	x26	SMSA				89-92		    
 	x27	SMSA-I				89-92		    
 	x32	ADC Recipiency			64, 65		    
 	x34 	Alimony Recipiency		64-68		    
 	x35	Any Reason Could Not Take Job	64-67		    
  	x37	Complete High Grade Attended	92		    
 	x42	Family (Secondary) Membership 	89-92		    
 	x43	Family Number			64-67, 89-92	    
 	x46	Farm/Self-Employed Income	76-79		    
 	x54	Last Work Full Time		64-67		    
 	x55	Last Work Full Time For Pay	64-67		    
 	x57	Look For Full or Part Time Work	64-67		    
  	x61	Nonfarm Self-Employment Income	76-79		    
 	x62	Normal Full Time Job		64-67, 89-92	    
  	x65	Parents Presence		64-75		    
 	x66	Person Sequence Number		64-67		    
 	x70	Public Assistance Amount	64-67		    
 	x71	Public Assistance Recipiency	64, 65		    
 	x73	Reason Not At Work Last Week	64-67		    
 	x77	Subfamily Membership Key	64-67, 89-92	    
 	x78	Unemployment Recipiency		64-68 		    
 	x82	Weeks Looking for Work Last Year64-75, 89-92	    
 	x83	Weeks Looking for Work Last Year64-67		    
 	x84	Weeks Looking/Layed Off Work	64-75		    
 	x86	Weeks Worked Last Year-I	64-75		    
 	x87	Weeks Worked Last Year-II	89-92		    
 	x89	Why Look For Work		64-66		    
 	x91	Person Serial Number		64-67, 80-92	    
 	x92	Poverty Cutoff Dollars		64-67		    
 	x93	Poverty Level			64-67		    
 	x95	Spanish Ethnicity		64-70		    
 	x97	Main Reason For Part-Year Work	64-67, 89-92	    
 	x99	Stretches of Unemployment	64-67, 76-79	    
 	x100	Weeks in Labor Force		76-92		    
	x101	Family A Weight			64-67, 77-92	    
 	x102	Family P Weight			64-67, 77-92	    
 	x103	Family Weight Basic		76		    
 	x104	Household Weight		64-76		    
 	x105	Person A Weight			64-67, 76-92	    
 	x106	Person P Weight			64-67, 76-92	    
  	x108	Basic CPS Weight		68-88		    
 	x109	Type-A-Income			64-67, 80-92	    
 	x110	Type-B-Income			64-67, 89-92	    
 	x111	Type-C-Income			64-67, 89-92	    
 	x112	Type-D-Income			64-67, 89-92	    
 	x113	Type-E-Income			64-67, 89-92	    
 	x114	Dividends and Interest		64-75		    
 	x115	Rental Income			64-75		    
 	x116	Public-Assistance Income	64-75		    
	x117	Supplemental Security Income	64-75		    
 	x118	CPI-Index			89-92		    
 	x119	Version Number Major I.D.	64-92		    
 	x120	Version Number Minor I.D.	64-67, 80-92	    
 	x121	Presence of Own Children	68-92	 	    
 	x122	Own Chilren Under 6 (in family)	68-79		    
 	x123	Own Children Under 18		68-79		    
 	x124	Related Children Under 18 	68-92		    
 	x125	Family Members Under 18		68-79		    
 	x126	Family Members Over 18		68-92		    
  	x127	Female Family Members 18+	68-92		    
 	x128	Labor Force Status		68-92		    
 	x129	Household Flag			68-92		     */