Combining Family History and Machine Learning to Link Historical Records

Joseph Price, Kasey Buckles, Jacob Van Leeuwen, Isaac Riley

NBER Working Paper No. 26227
Issued in September 2019
NBER Program(s):Program on Children, Program on the Development of the American Economy, Labor Studies Program, Public Economics Program

A key challenge for research on many questions in the social sciences is that it is difficult to link historical records in a way that allows investigators to observe people at different points in their life or across generations. In this paper, we develop a new approach that relies on millions of record links created by individual contributors to a large, public, wiki-style family tree. First, we use these “true” links to inform the decisions one needs to make when using traditional linking methods. Second, we use the links to construct a training data set for use in supervised machine learning methods. We describe the procedure we use and illustrate the potential of our approach by linking individuals across the 100% samples of the US decennial censuses from 1900, 1910, and 1920. We obtain an overall match rate of about 70 percent, with a false positive rate of about 12 percent. This combination of high match rate and accuracy represents a point beyond the current frontier for record linking methods.

You may purchase this paper on-line in .pdf format from ($5) for electronic delivery.

Access to NBER Papers

You are eligible for a free download if you are a subscriber, a corporate associate of the NBER, a journalist, an employee of the U.S. federal government with a ".GOV" domain name, or a resident of nearly any developing country or transition economy.

If you usually get free papers at work/university but do not at home, you can either connect to your work VPN or proxy (if any) or elect to have a link to the paper emailed to your work email address below. The email address must be connected to a subscribing college, university, or other subscribing institution. Gmail and other free email addresses will not have access.


Machine-readable bibliographic record - MARC, RIS, BibTeX

Document Object Identifier (DOI): 10.3386/w26227

NBER Videos

National Bureau of Economic Research, 1050 Massachusetts Ave., Cambridge, MA 02138; 617-868-3900; email:

Contact Us