Large Language Models: An Applied Econometric Framework

Jens Ludwig; Sendhil Mullainathan; Ashesh Rambachan

doi:10.3386/w33344

Large Language Models: An Applied Econometric Framework

Jens Ludwig, Sendhil Mullainathan & Ashesh Rambachan

Working Paper 33344

DOI 10.3386/w33344

Issue Date January 2025

Revision Date December 2025

Large language models (LLMs) enable researchers to analyze text at unprecedented scale and minimal cost. Researchers can now revisit old questions and tackle novel ones with rich data. We provide an econometric framework for realizing this potential in two empirical uses. For prediction problems – forecasting outcomes from text – valid conclusions require “no training leakage” between the LLM’s training data and the researcher’s sample, which can be enforced through careful model choice and research design. For estimation problems – automating the measurement of economic concepts for downstream analysis – valid downstream inference requires combining LLM outputs with a small validation sample to deliver consistent and precise estimates. Absent a validation sample, researchers cannot assess possible errors in LLM outputs, and consequently seemingly innocuous choices (which model, which prompt) can produce dramatically different parameter estimates. When used appropriately, LLMs are powerful tools that can expand the frontier of empirical economics.

This paper was supported by the Center for Applied Artificial Intelligence at the University of Chicago and the Altman Family Fund at MIT. We thank Haya Alsharif and Janani Sekar for excellent research assistance. We thank Peter Bergman, Tim Christensen, Oeindrila Dube, Larry Katz, Lindsey Raymond, Suproteem Sarkar, Andrei Shleifer, and David Yanagizawa-Drott as well as numerous seminar audiences and conference participants for helpful comments. We also thank the editor and three reviewers for their valuable feedback. Wharton Research Data Services (WRDS) was used in preparing this paper. This service and the data available constitute valuable intellectual property and trade secrets of WRDS and/or its third-party suppliers. All interpretations and any errors are our own. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research.
Copy Citation

Jens Ludwig, Sendhil Mullainathan, and Ashesh Rambachan, "Large Language Models: An Applied Econometric Framework," NBER Working Paper 33344 (2025), https://doi.org/10.3386/w33344.

Download Citation

MARC RIS BibTeΧ
- January 9, 2025

Large Language Models: An Applied Econometric Framework

Related

Topics

Programs

Working Groups

Conferences

More from the NBER