State Mortality Data, 1900-1936Files containing the unbalanced panel of annual state-level mortality in registration states by cause and by age and sex for years 1900-1936:
Each state by year observation contains variables with self- explanatory names for total deaths, deaths by cause, deaths by age, deaths by age for males, and deaths by age for females in all death registration area states, 1900-1936. I have used conservative assumptions to standardize a few causes and age groups across years (and have not included causes that are inconsistent across years - these are present in the raw Excel data which I'll provide as well.)
Here is a list of anomalies of which I am aware and that appear to be present in the historical mortality volumes (rather than being due to data entry errors):
One nice feature of this data is that it is possible to compare totals provided in the historical volumes with sums by cause and by age to detect data entry errors (the data was also double-entered by Digital Divide Data). I've doubled-checked all inconsistencies caught by these comparisons - the ones listed here appear to be present in the printed historical mortality statistics volumes.
I'll need to do a bit more work in documenting everything - the main thing that is not transparent is how I created a few variables. My objective was to make the STATA dataset consistent across years, so under conservative assumptions, I combined a few categories of deaths that are reported differently in different years. For example, in some years, "cancer" and "tumor" deaths are reported separately, while in other years they are reported together as "cancer and tumors." So I created a single variable throughout called "cancer and tumors." The variables which required a little manipulation and a few reasonable assumptions are:
As I mentioned, I'd like to encourage people to use this data, so please feel free to share it with whomever might be interested. In particular, I'd very much like to know if additional errors are found. Once its up on the Berkeley demography web site, I'll let you know.
Pneumonia deaths prior to 1910 were reported separately for broncopneumonia and for lobar and undefined pneumonia deaths. In 1910 and subsequent years, pneumonia deaths were reported as the sum of all types of pneumonia deaths. The original digitized dataset contained lobar and undefined pneumonia for years prior to 1910, and all-cause pneumonia in 1910 and subsequent years. This updated version now includes a consistent measure of all-cause pneumonia deaths (lobar, bronco and undefined pneumonia) for all years. Additionally, small data entry errors were corrected, the most notable of which was for 1902 (1901 total mortality and cause-specific mortality were previously recorded for both 1901 and 1902).
The original files are available as https://www.nber.org/data/vital-statistics-deaths-historical/archive