The data is drawn from a paper by Robert Lalonde,
"Evaluating the Econometric Evaluations of Training Programs," American
Economic Review, Vol. 76, pp. 604-620. We are grateful to him for allowing
us to use this data, assistance in reading his original data tapes, and
permission to publish it here.
Data Files
NSW_TREATED.TXT (297 observations)
NSW_CONTROL.TXT (425 observations)
These files contain the treated and control units from the male sub-sample from the National Supported Work Demonstration as used by Lalonde in his paper. These are text files. The order of the variables from left to right is: treatment indicator (1 if treated, 0 if not treated), age, education, Black (1 if black, 0 otherwise), Hispanic (1 if Hispanic, 0 otherwise), married (1 if married, 0 otherwise), nodegree (1 if no degree, 0 otherwise), RE75 (earnings in 1975), and RE78 (earnings in 1978). The last variable is the outcome; other variables are pre-treatment.
PSID_CONTROLS.TXT (2490
observations)
PSID2_CONTROLS.TXT (253
observation)
PSID3_CONTROLS.TXT (128
observations)
CPS_CONTROLS.TXT (15,992
observations)
CPS2_CONTROLS.TXT (2,369
observations)
CPS3_CONTROLS.TXT (429 observations)
These six files contain the non-experimental comparison groups constructed by Lalonde from the Population Survey of Income Dynamics and the Current Population Survey, and the further subsets he created from the two basic comparison groups. CPS2 and CPS3 are very similar to, but not exactly the same as, as Lalonde's subsets; for CPS, we were unable to re-create his subsets exactly.
Finally, based on pre-intervention variables, we extract a further subset of Lalonde's NSW experimental data, a subset containing information on RE74 (earnings in 1974):
NSWRE74_CONTROL.TXT (260 observations)
NSWRE74_TREATED.TXT (185 observations)
The variables from left to right are: treatment indicator (1 if treated, 0 if not treated), age, education, Black (1 if black, 0 otherwise), Hispanic (1 if Hispanic, 0 otherwise), married (1 if married, 0 otherwise), nodegree (1 if no degree, 0 otherwise), RE74 (earnings in 1974), RE75 (earnings in 1975), and RE78 (earnings in 1978).
Finally, note that in Table 1 of published paper the mean of Hispanic for PSID3 is mis-stated. It should be 0.12, not 0.18.