NATIONAL BUREAU OF ECONOMIC RESEARCH
NATIONAL BUREAU OF ECONOMIC RESEARCH

How Well Do Automated Methods Perform in Historical Samples? Evidence from New Ground Truth

Martha Bailey, Connor Cole, Morgan Henderson, Catherine Massey

NBER Working Paper No. 24019
Issued in November 2017
NBER Program(s):Aging, Development of the American Economy, Labor Studies

New large-scale data linking projects are revolutionizing empirical social science. Outside of selected samples and tightly restricted data enclaves, little is known about the quality of these “big data” or how the methods used to create them shape inferences. This paper evaluates the performance of commonly used automated record-linking algorithms in three high quality historical U.S. samples. Our findings show that (1) no method (including hand linking) consistently produces samples representative of the linkable population; (2) automated linking tends to produce very high rates of false matches, averaging around one third of links across datasets and methods; and (3) false links are systematically (though differently) related to baseline sample characteristics. A final exercise demonstrates the importance of these findings for inferences using linked data. For a common set of records, we show that algorithm assumptions can attenuate estimates of intergenerational income elasticities by almost 50 percent. Although differences in these findings across samples and methods caution against the generalizability of specific error rates, common patterns across multiple datasets offer broad lessons for improving current linking practice.

You may purchase this paper on-line in .pdf format from SSRN.com ($5) for electronic delivery.

Access to NBER Papers

You are eligible for a free download if you are a subscriber, a corporate associate of the NBER, a journalist, an employee of the U.S. federal government with a ".GOV" domain name, or a resident of nearly any developing country or transition economy.

If you usually get free papers at work/university but do not at home, you can either connect to your work VPN or proxy (if any) or elect to have a link to the paper emailed to your work email address below. The email address must be connected to a subscribing college, university, or other subscribing institution. Gmail and other free email addresses will not have access.

E-mail:

The NBER Bulletin on Aging and Health provides summaries of publications like this.  You can sign up to receive the NBER Bulletin on Aging and Health by email.

Machine-readable bibliographic record - MARC, RIS, BibTeX

Document Object Identifier (DOI): 10.3386/w24019

Users who downloaded this paper also downloaded* these:
Aghion, Bergeaud, Boppart, Klenow, and Li w24023 Missing Growth from Creative Destruction
Mankiw and Reis w24043 Friedman's Presidential Address in the Evolution of Macroeconomic Thought
Aghion, Jones, and Jones w23928 Artificial Intelligence and Economic Growth
Davis, Guryan, Hallberg, and Ludwig w23925 The Economics of Scale-Up
Johnson w24027 Measuring Global Value Chains
 
Publications
Activities
Meetings
NBER Videos
Themes
Data
People
About

National Bureau of Economic Research, 1050 Massachusetts Ave., Cambridge, MA 02138; 617-868-3900; email: info@nber.org

Contact Us