Guide To CPS Files At NBER


                     CPS Files at NBER


Micro data files from the Current Population Survey are
available at http://www.nber.org/cps/.
Additional files from the Census can be added on request,
as can user supplied CPS tapes.
(Please e-mail Dan Feenberg to request additional CPS files).

All the CPS data files are ASCII files compressed with the Unix
compress command. This makes them uncompressable by either
the standard Unix uncompress command, or gzip. A copy of gzip
for the PC is also kept in this directory.

The following discussion assumes that you are familiar with
the use of CPS tapes in other environments and deals only with
the NBER specific information.

All of the data files use MS-DOS line ending conventions. That is
instead of just a line feed after each line (the Unix convention)
there is a carriage return and a line feed (the DOS convention).
This will not disturb any Unix programs that I know of, and makes
the files more accessible to DOS software. It does change the record
length for any program using fixed length records. In the table below
the record length given is the size of the data record, the actual
record is 2 bytes more. The data record length for any file can be
confirmed with: 'zcat cpsMMMYY.raw.Z|head -1|wc' The last number
returned is the record length including the carriage return and
line feed that separate records. Use this number when reading with
fixed length reads. The data record length (as given in the CPS
documentation) is 2 bytes less.

File Name       lrecl   Description      Supplement

cpsjan81.icpsr.Z        January 1981 ICPSR Format - use their docs      ure
cpsjan83.raw.Z  481     January 1983 CPS Occupational Mobility & Job Tenure
cpsjan87.raw.Z  440+2   January 1987 CPS Occupational Mobility & Job Tenure
cpsjan88.raw.Z  554+2   January 1988 CPS Occupational Mobility & Job Tenure
cpsjan91.raw.Z  440+2   January 1991 CPS Job Training
cpsjan92.raw.Z  408     January 1992 CPS Job Training
cpsjan93.raw.Z  506+2   January 1993 CPS Tobacco Use
cpsjan95.raw.Z  815+2   January 1995 CPS basic
cpsfeb90.raw.Z          February1990 Both education variables available.
cpsfeb94.raw.Z  950+2   February1994 Displaced workers, tenure, mobility
cpsfeb95.raw.Z  ???+2   February1995 Displaced workers, tenure, mobility
cpsfeb96.raw.Z  997+2   February1996 Displaced workers, tenure, mobility
cpsfeb97.raw.Z          February1996 Displaced workers, tenure, mobility
cpsfeb98.raw.Z 1053+2   February1990 Displaced workers, tenure, mobility
cpsmar62.raw.Z     +2   March   1968 CPS Annual Demographic File
cpsmar64.raw.Z     +2   March   1968 CPS Annual Demographic File
cpsmar66.raw.Z     +2   March   1968 CPS Annual Demographic File
cpsmar67.raw.Z     +2   March   1968 CPS Annual Demographic File
cpsmar68.raw.Z  215+2   March   1968 CPS Annual Demographic File
cpsmar69.raw.Z  215+2   March   1969 CPS Annual Demographic File
cpsmar72.raw.Z  215+2   March   1972 CPS Annual Demographic File
cpsmar73.raw.Z  221+2   March   1973 CPS Annual Demographic File
aremf73d.dat.Z          March   1973 CPS ADF matched to IRS data
cpsmar74.raw.Z  222+1   March   1974 CPS Annual Demographic File
cpsmar75.raw.Z  222+2   March   1975 CPS Annual Demographic File
cpsmar76.raw.Z  338+2   March   1976 CPS Annual Demographic File
cpsmar77.raw.Z  342+2   March   1977 CPS Annual Demographic File
cpsmar78.raw.Z  342+2   March   1978 CPS Annual Demographic File
cpsser78.raw.Z          March   1978 ADF match to SS Earnings History
cpsmar79.raw.Z  342+2   March   1979 CPS Annual Demographic File
cpsmar80.raw.Z  360+2   March   1980 CPS Annual Demographic File
cpsmar81.raw.Z  390+2   March   1981 CPS Annual Demographic File
cpsmar82.raw.Z  390+2   March   1982 CPS Annual Demographic File
cpsmar83.raw.Z  390+2   March   1983 CPS Annual Demographic File
cpsmar84.raw.Z  390+2   March   1984 CPS Annual Demographic File
cpsmar85.raw.Z  390+2   March   1985 CPS Annual Demographic File
cpsmar86.raw.Z  407+2   March   1986 CPS Annual Demographic File
cpsmar87.raw.Z  407+2   March   1987 CPS Annual Demographic File
cpsmar88.raw.Z  656+2   March   1988 CPS Annual Demographic File Rewrite
cpsmar88.oldformat.Z    March   1988 CPS ADF in 407 byte format.
cpsmar89.raw.Z  656+2   March   1989 CPS Annual Demographic File
cpsmar90.raw.Z  656+2   March   1990 CPS Annual Demographic File
cpsmar91.raw.Z  656+2   March   1991 CPS Annual Demographic File
cpsmar92.raw.Z  704+2   March   1992 CPS Annual Demographic File
cpsmar93.raw.Z  704+2   March   1993 CPS Annual Demographic File
cpsmar94.raw.Z  744+2   March   1994 CPS ADF Major Revisions
cpsmar95.raw.Z  744+2   March   1995 CPS ADF (updated August 1996)
cpsmar95.raw.Z  744+2   March   1995 CPS ADF (updated May 1997)
cpsmar96.raw.Z  838+2   March   1996 CPS ADF
cpsmar97.raw.Z  838+2   March   1997 CPS ADF
cpsmar98.raw.Z  853+2   March   1998 CPS ADF (lrecl is a guess)
cpsmar-apr92.raw.Z 925+2 March-April Match 1992 Alimony & Child Support
cpsapr74.raw.Z  600+2   April   1974 CPS (possible problem tape)
cpsapr93.raw.Z     +2   April   1993 CPS Employee Benefit
cpsmay69.raw.Z  512+2   May     1969 CPS
cpsmay73.raw.Z  480+2   May     1973 CPS
cpsmay79.raw.Z  792+2   May     1979 CPS Pensions
cpsmay80.raw.Z  480+2   May     1980 CPS
cpsmay81.raw.Z  660+2   May     1981 CPS
cpsmay83.raw.Z  678+2   May     1983 CPS Employee Benefit
cpsmay85.raw.Z  600+2   May     1985 CPS
cpsmay86.raw.Z  660+2   May     1986 CPS
cpsmay87.raw.Z  660+2   May     1987 CPS
cpsmay88.raw.Z  804+2   May     1988 CPS
cpsmay89.raw.Z  411+2   May     1989 CPS
cpsmay91.raw.Z          May     1991 CPS Work schedule supp.
cpsmay97.raw.Z          May     1997 CPS Work schedule supp.
cpsjun80.raw.Z  721+2   June    1985 CPS Fertility and Immigration Supp.
cpsjun81.raw.Z  601+2   June    1985 CPS Fertility and Immigration Supp.
cpsjun82.raw.Z  481+2   June    1985 CPS Fertility and Immigration Supp.
cpsjun85.raw.Z  702+2   June    1985 CPS Fertility and Immigration Supp.
cpsjun86.raw.Z  603+2   June    1986 CPS Fertility and Immigration Supp.
cpsjun87.raw.Z  798+2   June    1987 CPS Fertility and Immigration Supp.
cpsjun88.raw.Z  601+2   June    1988 CPS Fertility and Immigration Supp.
cpsjun94.raw.Z  862+2   June    1994 CPS Updated 9/96
cpssep92.raw.Z  506+2   September 1993 CPS Tobacco Use
cpssep94.raw.Z          Septemb 1994 CPS Health and Benefits Updated 9/96
cpssep97.raw.Z          Septemb 1997                                    6
cpsoct84.raw.Z  690+2   October 1984 CPS Schooling & Computers Supp.
cpsoct87.raw.Z  480+2   October 1987 CPS
cpsoct89.raw.Z  441+2   October 1989 CPS Schooling & Computers Supp.
cpsoct93.raw.Z          October 1993 CPS Schooling & Computers Supp.
cpsoct94.raw.Z 1013+2   October 1994 CPS Schooling & Computers Supp. Updated 9/96
cpsoct97.raw.Z 1137+2   October 1994 CPS Schooling & Computers Supp. Updated 9/96
cpsnov94.raw.Z     +2   November 1994 Updated 9/96                      ated 9/96

Where available, documentation is in a .doc file. If machine
readable documentation is not available, we should have hard
copies at 1050 in the red cabinet. CPS documentaion can be
ordered from Census Data User Services at 301-763-4100 for $10
per file. This may be charged to a credit card and is usually
delivered promptly. Sometimes the Government Data Center at
Harvard or Dewey Library at MIT will have ICPSR codebooks.

Reading compressed files:

A very nice feature of Unix is that compressed files can often
be used without storing the decompressed file anywhere. This
usually depends on the zcat command, which decompresses a file
and writes it to the standard output, from which it may be piped
to another command. Here is an example using SAS:

      data;
      filename cps pipe 'zcat /home/data/cps/cpsmar92.raw.Z';
      infile cps lrecl=705;
      input rectype 1 - 1 hhid 2 - 8;
      file 'cpsmar92.subset';
      put hhid;
      run;

The Unix command zcat decompresses the file to the standard output and
the pipe option of the filename command causes SAS to read from that
stream. Notice the lrecl=705 - one more than the length of the data
record.

SAS isn't necessary to make extracts, the Unix cut command is very
easy. For example, to extract the same fields with cut you would:

zcat /home/data/cps/cpsmar92.raw.Z | cut -c1,2-8 >extract.dat

Then extract.dat can be read by STATA or any other package. A
weakness of the cut command is that the columns requested must
be listed in increasing order. This prevents the user from
placing a particular variable at the start of the output record,
where record selection could be done by the grep command.
Nevertheless, if a user wanted only person records from the above
example it could be done with:

zcat /home/data/cps/cpsmar93.raw.Z|cut -c1,2-8|grep '^2' >extract.dat

You may remember that ^ signifies the beginning of a line in grep.
Clever users may be able to use the \m construct of grep, or
the paste command to get around this limitation, others may
wish to differ record selection to the statistical package.

Caution: These files have been obtained from a variety of primary
and secondary sources. Before using any one of them, you should
confirm from the number of records and other internal evidence
that the file is for the month and year it purports. Don't attempt
to use a file without documentation for that month and year!
If you have the previous and next years, and there is no change,
that might be an excuse to start work without the year you want.

Here is another example, showing how to deal with a hierarchical
file using the '@' symbol and the retain statement:

filename raw pipe 'zcat /home/data/cps/cpsmar92.raw.Z';
data;
retain msa;
infile raw lrecl=1241;
input rectype $ 1 - 1 @;
 if rectype eq '1' then input hhid   2 - 8 msa 44 - 47;
 if rectype eq '3' then do;
  input age 40 - 41;
  put hhid ':'  msa ':' age;
end;
run;

The '@' symbol at the end of an 'input' statement tells SAS not
to advance to the next record for the next input statement. In
this way the record type can be determined from the first 'input'
statement, and then a second 'input' statement can be executed
which may depend on the value of 'rectype'. You can see that we
only output for 'Person' records, the 'retain' statement insures
that the value of 'msa' output is the one read on the most recent
household record.