Data from
"Causal Effects in Non-Experimental Studies: Reevaluating
the Evaluation of Training Programs,"
Journal of the American Statistical Association, Vol. 94, No. 448 (December 1999), pp. 1053-1062.
and
"Propensity Score Matching Methods for Non-Experimental Causal Studies,"
Review of Economics and Statistics, Vol. 84, (February 2002), pp. 151-161.
The data are drawn from a paper by Robert Lalonde,
"Evaluating the Econometric Evaluations of Training Programs," American
Economic Review, Vol. 76, pp. 604-620. We are grateful to him for allowing
us to use this data, assistance in reading his original data tapes, and
permission to publish it here.
NSW Data Files (Lalonde Sample)
These files contain the treated and control units from
the male sub-sample from the National Supported Work Demonstration as used
by Lalonde in his paper.
nsw_treated.txt (297 observations)
nsw_control.text (425 observations)
These are text files. The order of the variables from left to right is: treatment indicator (1 if treated, 0 if not treated), age, education, Black (1 if black, 0 otherwise), Hispanic (1 if Hispanic, 0 otherwise), married (1 if married, 0 otherwise), nodegree (1 if no degree, 0 otherwise), RE75 (earnings in 1975), and RE78 (earnings in 1978). The last variable is the outcome; other variables are pre-treatment.
nsw.dta NSW treated and control observations in Stata format
NSW Data Files (Dehejia-Wahha Sample) Based on pre-intervention variables, we extract a further subset of Lalonde's NSW experimental data, a subset containing information on RE74 (earnings in 1974):
nswre74_control.txt (260 observations)
nswre74_treated.txt (185 observations)
The variables from left to right are: treatment indicator (1 if treated, 0 if not treated), age, education, Black (1 if black, 0 otherwise), Hispanic (1 if Hispanic, 0 otherwise), married (1 if married, 0 otherwise), nodegree (1 if no degree, 0 otherwise), RE74 (earnings in 1974), RE75 (earnings in 1975), and RE78 (earnings in 1978).
nsw_dw.dta NSW treated and control observations (Dehejia-Wahba Sample) in Stata format
PSID and CPS Data Files These six files contain the non-experimental comparison groups constructed by Lalonde from the Population Survey of Income Dynamics and the Current Population Survey, and the further subsets he created from the two basic comparison groups. CPS2 and CPS3 are very similar to, but not exactly the same as, as Lalonde's subsets; for CPS, we were unable to re-create his subsets exactly. The variables from left to right are: treatment indicator (1 if treated, 0 if not treated), age, education, Black (1 if black, 0 otherwise), Hispanic (1 if Hispanic, 0 otherwise), married (1 if married, 0 otherwise), nodegree (1 if no degree, 0 otherwise), RE74 (earnings in 1974), RE75 (earnings in 1975), and RE78 (earnings in 1978).
PSID controls (2490 observations): psid_controls.txt (text format), psid_controls.dta (Stata format)
PSID2 controls (253 observations): psid_controls.txt (text format), psid_controls2.dta (Stata format)
PSID3 controls (128 observations): psid_controls.txt (text format), psid_controls3.dta (Stata format)
CPS controls (15,992 observations): cps_controls.txt (text format), cps_controls.dta (Stata format)
CPS2 controls (2,369 observations): cps2_controls.txt (text format), cps_controls2.dta (Stata format)
CPS3 controls (429 observations): cps3_controls.txt (text format), cps_controls3.dta (Stata format)
Correction Finally, note that in Table 1 of Dehejia and Wahba (1999) the mean of Hispanic for PSID3 is mis-stated. It should be 0.12, not 0.18.