Linking Individuals Across Historical Sources: a Fully Automated Approach

Ran Abramitzky, Roy Mill, Santiago Pérez

NBER Working Paper No. 24324
Issued in February 2018, Revised in March 2019
NBER Program(s):Aging, Development of the American Economy, Labor Studies

Linking individuals across historical datasets relies on information such as name and age that is both non-unique and prone to enumeration and transcription errors. These errors make it impossible to find the correct match with certainty. In the first part of the paper, we suggest a fully automated probabilistic method for linking historical datasets that enables researchers to create samples at the frontier of minimizing type I (false positives) and type II (false negatives) errors. The first step guides researchers in the choice of which variables to use for linking. The second step uses the Expectation-Maximization (EM) algorithm, a standard tool in statistics, to compute the probability that each two records correspond to the same individual. The third step suggests how to use these estimated probabilities to choose which records to use in the analysis. In the second part of the paper, we apply the method to link historical population censuses in the US and Norway, and use these samples to estimate measures of intergenerational occupational mobility. The estimates using our method are remarkably similar to the ones using IPUMS’, which relies on hand linking to create a training sample. We created an R code and a Stata command that implement this method.

You may purchase this paper on-line in .pdf format from ($5) for electronic delivery.

Access to NBER Papers

You are eligible for a free download if you are a subscriber, a corporate associate of the NBER, a journalist, an employee of the U.S. federal government with a ".GOV" domain name, or a resident of nearly any developing country or transition economy.

If you usually get free papers at work/university but do not at home, you can either connect to your work VPN or proxy (if any) or elect to have a link to the paper emailed to your work email address below. The email address must be connected to a subscribing college, university, or other subscribing institution. Gmail and other free email addresses will not have access.


Machine-readable bibliographic record - MARC, RIS, BibTeX

Document Object Identifier (DOI): 10.3386/w24324

Users who downloaded this paper also downloaded* these:
Bryan, Choi, and Karlan w24278 Randomizing Religion: The Impact of Protestant Evangelism on Economic Outcomes
Auclert and Rognlie w24280 Inequality and Aggregate Demand
Bernanke and Gürkaynak Is Growth Exogenous? Taking Mankiw, Romer, and Weil Seriously
Fieldhouse and Mertens w23165 A Narrative Analysis of Mortgage Asset Purchases by Federal Agencies
Cheng and Xiong w19642 The Financialization of Commodity Markets
NBER Videos

National Bureau of Economic Research, 1050 Massachusetts Ave., Cambridge, MA 02138; 617-868-3900; email:

Contact Us