School District Databook (SDDB)

Source NCES 95-705

This is a special tabulation of the 1990 Census long forms, done by school district boundary and children's ages. The tabulations cover a large number of combinations of household type, race, income etc. Superficially this allows you to learn about the characteristic of school districts, at a similar level of detail to the City and County Databook. Such a file would take only a few megabytes, however. So why does the SDDB fill more than 100 CD-ROM disks? Because it provides that information categorized by children's age, grade level or enrollment status. That is, you can not only find the number of pre-kindergarten children by district, but also the number of black pre-kindergarten children in non-poverty single-parent households by district. The effective number of variables is overwhelming, but this is not micro-data.

This Version

The SDDB (now called the SDDS) was sponsored by the National Center for Educational Statistics, which offers a compressed version on 44 CD-ROM disks. The SDDS web site has more information. Due to the difficulty in using that version of the dataset for analytical purposes, the NBER has purchased an uncompressed copy of the source dataset from the National Archives and is making it available here in a slightly improved format. The NARA format data is available on this web site, however, we also have a slightly improved and much more compact format that drops the age/grade/enrollment status breakdown.

The Fields

Each state has eight ASCII files containing information about the demographic character of school districts. This data is Census tract data aggregated to the school district boundary level, and nearly all items are numbers of persons or households in particular categories by district. A few items are dollar amounts of expenditure or income. There are separate record types for the following universes. Select the link for a complete codebook for that record type:

HT(1) - all households (981 fields)
PS(2a) - all persons (2501 fields)
PT(2b) - all persons - summary (3187 fields)
HC(3) - households with children (808 fields, same as first 808 of type HT)
PR(4) - parents with children (3187 fields, same as type PT)
CH(5) - children's households characteristics (808 fields)
CP(6) - children's parents' characteristics (2813 fields)
CO(7) - children's own characteristics (2271 fields)

For record types 1, 2a, and 2b, there is one record for each school district. The fields show the demographic characteristics of the district. For example, variable 21 on record type HT shows the number of non-family households in poverty for each district. (Select HT above and then select P019b

In the original datasets record types 3 through 7 are each iterated through 42 combinations of each enrollment status and age or grade level. At this time the NBER files omit these iterated records, but they could be made available if there was interest.

Here is the format of the record identification prefix, the first 40 bytes of each record (but see also this:

1 - 4 NARA Record length (9500) 5 - 10 Record length (original) 11 - 11 Record type (1 through 7, but see position 40, also) 12 - 12 District type 13 - 14 State Code (FIPS code, 1-56) 15 - 17 County code (non-zero only on county summary records) 18 - 22 School district code (unique only within a state, 00000 for state or county aggregate) The following grade levels are "01" through "12" plus "PK" and "KG". 23 - 24 CCD lowest grade (00 for state or county) 25 - 26 CCD highest grade (00 for state or county) 27 - 28 Augmented lowest grade (00 for state or county) 29 - 30 Augmented highest grade (00 for state or county) 31 - 37 CCD enrollment (000000 for state or county) 38 - 38 Age/grade status 0 Total universe for record types 1 and 2 1 0 - 2 years 2 3 - 4 years 3 5 - 13 years 4 14 - 17 years 5 18 - 19 years 6 3 - 19 years 7 5 - 17 years A Pre-kindergarten B Kindergarten C Grades 1 - 4 D Grade 5 - 8 E Grades 9 - 12 F Total Relevant (Only these in NBER files) 39 - 39 Enrollment status 0 Total universe for record types 1 and 2 1 Total (Only these records in NBER files) 2 Total enrolled 3 Enrolled in public school 4 Enrolled in private school (record not in database: derive: enrollment status record 2 - enrollment status record 3) 5 Not enrolled (record not in database: derive: enrollment status record 1 - enrollment status record 2) 40 Persons or Summary A Record type 2a B Record type 2b

Organization of the NBER Format File

The original file format was remarkably large and unwieldy. Data for individual states was spread across as many as 11 different CDs, and different record types were mixed in the same file. The file is composed almost entirely of 9-byte fields for population counts, but with a handful of 18-byte fields for dollar amounts and a 40-byte prefix. Since the only item location information given in the documentation was the field number, it is tedious to determine the starting and ending byte for any particular field (which depends on the number of 9 and 18 bytes fields preceding it in the record).

To make a more accessible version of the SDDB, we have created an NBER format file. In this format, we have allocated 40 bytes for the prefix, 10 bytes for every field, and divided all aggregate dollar amounts by 1,000. With those changes it is easy to translate any field number into a byte location, there are no field overflows, items are separated by spaces, and the loss of precision is not significant. The reformatting makes it easier, not harder to use the original documentation.

Any field N starts at byte 31+10*N and ends at 40+10*N in the NBER version of the files. Here is an example of how to determine the byte location of a variable on one of the files. Consider "Persons in Household" (Table P016 on the HT record). Find it by selecting on the "All Households" link just above, then on the third table link. At the far right of that page, you can see that the number of one-person households is given in variable number 3 on that file. The number of two-person households in variable 4, etc. So those two variables are at byte locations 61-70 and 71-80 on each record of the HT file. Then the individual records required for the enrollment status by age or grade are located from the information in the prefix (shown above).

Online Data

Currently, we have and have online 156 files, which NARA says is the whole file or least all they have. They believe there may be records or parts of records missing from California and Minnesota. We have observed that Minnesota contains 603 undecipherable records. The only examination we have done of the files supplied is to check for proper record sequence. There are problems in about half the files, but it is possible that the only problem is the sequence.

As received from NARA
NBER Simplified format:
State Summary Records
County Summary Records
Individual District Records

In our format, there is one file for each record type. The files are compressed in gzip format and together will fit on a single CD-ROM. Obviously executing code from a strange web site is a bad idea, you shouldn't be doing that unless you know NBER well enough to trust its website.

References and Other Information

NBER format Codebook
Decennial Census School District Planning Report NCES Working Paper 98-07. This is very informative about the structure and function of the 1990 SDDB while oriented towards the planning of the 2000 version. reference manual.
Reference manual from NARA.
Help file provided with NCES version.
NCES web site - very little about the 1990 SDDB, put includes ordering information for the 44 disk NCES version.
Proximity One was the contractor for NCES.
Oregon State University Government Information Sharing Project. At this site you can retrieve data from the "Top 100" variables in the SDDB.
The ICPSR also offers the SDDB, (study number 2953), but the web site mentions that they have 153 files out of the 156 total I have. I am not sure which files they are missing. [I understand ICPSR has obtained more files since 2002 - drf].