Combining Family History and Machine Learning to Link Historical Records

Joseph Price; Kasey Buckles; Jacob Van Leeuwen; Isaac Riley

doi:10.3386/w26227

Combining Family History and Machine Learning to Link Historical Records

Joseph Price, Kasey Buckles, Jacob Van Leeuwen & Isaac Riley

Working Paper 26227

DOI 10.3386/w26227

Issue Date September 2019

A key challenge for research on many questions in the social sciences is that it is difficult to link historical records in a way that allows investigators to observe people at different points in their life or across generations. In this paper, we develop a new approach that relies on millions of record links created by individual contributors to a large, public, wiki-style family tree. First, we use these “true” links to inform the decisions one needs to make when using traditional linking methods. Second, we use the links to construct a training data set for use in supervised machine learning methods. We describe the procedure we use and illustrate the potential of our approach by linking individuals across the 100% samples of the US decennial censuses from 1900, 1910, and 1920. We obtain an overall match rate of about 70 percent, with a false positive rate of about 12 percent. This combination of high match rate and accuracy represents a point beyond the current frontier for record linking methods.

This work has been supported in part by grant #G-1063 from the Russell Sage Foundation. Any opinions expressed are those of the principal investigators alone and should not be construed as representing the opinions of the Foundation or the National Bureau of Economic Research. We are grateful for helpful comments and assistance from Ran Abramitzky, Martha Bailey, Katherine Eriksson, James Feigenbaum, Joe Ferrie, Ian Fillmore, Cathy Fitch, Brigham Frandsen, Katie Genadek, Jonas Helgertz, Bob Pollack, Steve Ruggles, and Anne Winkler. This project would not have been possible without the careful and thoughtful work of many research assistants at the Brigham Young University Record Linking Lab, including Ben Branchflower, Alison Doxey, Neil Duzzett, Nicholas Grasley, Amanda Marsden, and Joseph Young.
Copy Citation

Joseph Price, Kasey Buckles, Jacob Van Leeuwen, and Isaac Riley, "Combining Family History and Machine Learning to Link Historical Records," NBER Working Paper 26227 (2019), https://doi.org/10.3386/w26227.

Combining Family History and Machine Learning to Link Historical Records

Related

Topics

Programs

More from the NBER