The Economics of Generative AI

04/24/2024

Featured in print Reporter

By Erik Brynjolfsson and Danielle Li

Artificial intelligence (AI) is not a new field. The term was coined in 1956, but the field has only recently begun having significant effects on the economy.

Research in AI went through three eras. Early work focused primarily on symbolic systems with hand-coded rules and instructions. In the 1980s, expert systems, which consisted of hundreds or thousands of “if…then” rules drawn from interviews with human experts, helped diagnose diseases and make loan recommendations, but with limited commercial success.

Later, the focus shifted to machine learning systems, including “supervised learning” systems trained to make predictions based on large datasets of human-labeled examples. As computational power increased, deep learning algorithms became increasingly successful, leading to an explosion of interest in AI in the 2010s.

More recently, even larger models using unsupervised or self-supervised systems have become a major focus of the field. Large-language models (LLMs) — trained on massive amounts of text to simply predict the next word in a sequence — have astounded the public with their ability to produce meaningful and remarkable output. These systems have been found to outperform humans for a growing range of knowledge-intensive tasks: taking the bar exam, for instance. In addition, studies show that access to LLMs and other types of generative AI tools can help human workers improve their own performance.

In the past year, a growing body of work has explored how new AI tools might impact productivity in applications as diverse as coding, writing, and management consulting.¹

In research with Lindsey Raymond, we analyze the effects of generative AI on worker productivity in the context of technical customer support.² Our study is based on data from over 5,179 agents, about 1,300 of whom were given access to an LLM-based assistant that provided real-time suggestions for communicating with customers. The system, trained on millions of examples of successful and unsuccessful conversations, provided suggestions that the agents could use, adapt, or reject. The tool was rolled out in phases, creating quasi-experimental evidence on its causal effects.

We found significant improvements in worker productivity as measured by the number of customer issues workers were able to resolve per hour. Within four months, treated agents were outperforming nontreated agents who had been on the job for over twice as long.

On average, worker productivity increased by 14 percent. These gains were concentrated among the lowest quintile of workers, whether measured by experience or prior productivity, where there were productivity gains of up to 35 percent. In contrast, the top quintile saw negligible gains and, in some cases, even small decreases in the quality of conversations, as measured by customer satisfaction. This pattern is reflective of how the system is trained: by observing successful conversations, the system is able to glean the behavior of the most skilled agents and pass on these behaviors as suggestions to novice workers.

Did the system deskill the workforce? Another natural experiment suggests not. As with most large systems, there were occasional outages when the system unexpectedly became unavailable. Workers who had previously been using the system now had to answer questions without access to it, and nonetheless they continued to outperform those who had never used the system. This suggests that the system helped them learn, and retain, answers.

Our results point to the possibility that — in contrast with earlier waves of information technology that largely benefited higher-skill workers — generative AI technologies could particularly benefit workers at the lower or middle levels of the skills distribution. Drawing on these and other results, David Autor sees opportunities for the recent waves of AI to help rebuild the middle class by increasing the value of output from their labor.³

Advances in AI technologies and algorithmic design can yield improvements beyond direct measures of productivity. For example, we saw evidence in our study that AI assistance improves the experience of work for treated agents, as measured by the processing of conversation transcripts: customers spoke more kindly to agents and were less likely to ask to speak to a supervisor. These effects were likely driven both by agents’ improved social skills and increased access to technical knowledge as a result of chat assistance.

Indeed, there is growing evidence that generative AI tools may outperform humans in an area traditionally considered a source of strength for humans relative to machines: empathy and social skills. One study of doctors’ responses to patient questions found that an LLM-based chatbot provided answers that were judged by expert human evaluators to be more detailed, higher quality, and 10 times more likely to be considered empathetic.⁴

Finally, innovations in AI systems may further improve the functioning of current AI tools. For example, Li, Raymond, and Peter Bergman explore how algorithm design can improve the quality of interview decisions in the context of professional services hiring. They find that while traditional supervised learning systems — which look for workers who match historical patterns of success in the firm’s training data — select higher-quality workers relative to human hiring, they are also far less likely to select applicants who are Black or Hispanic. In contrast, reinforcement learning and contextual bandit models — which value learning about workers who have not traditionally been represented in the firm’s training data — are able to deliver similar improvements in worker quality while also distributing job opportunities more broadly.

This figure is a scatter plot titled, Productivity of Customer Support Agents and AI Support. The y-axis is labeled, resolutions per hour. It ranges from 1 to 4. The x-axis is labeled, agent tenure, months. It ranges from 0 to 10. The graph displays three sets of scatter points representing different groups of agents: those with access to a specific resource from the time they join the firm, those who gain access in their fifth month with the firm, and those with no access at all. All three sets of agents start at around 1.75 resolutions per hour. The agents with access to the resource from the time they join the firm experience a steady increase in their resolution rate, reaching approximately 3.4 resolutions per hour at the 5-month mark. The agents who gain access to the resource in their fifth month with the firm only experience a significant increase in their resolution rate after the 5-month point. Their performance improves, reaching about 3.2 resolutions per hour at the 10-month mark. The agents with no access to the resource throughout the 10-month period still show an overall increase in their resolution rate, reaching around 2.6 resolutions per hour at 10 months. However, their performance varies over time, with some fluctuations in the resolution rate. The note on the figure reads, Bars represent 95% confidence intervals. The source line reads, Source: Erik Brynjolfsson, Danielle Li, and Lindsey R. Raymond. NBER Working Paper 31161. — Figure 1

While the effects of AI on productivity and work practices are now evident not only in a number of laboratory settings but also in business applications, it may take longer for them to show up in aggregate statistics. Brynjolfsson, Daniel Rock, and Chad Syverson discuss a set of reasons why the effects of AI might not quickly change aggregate productivity numbers.⁵

For one thing, labor productivity is typically defined as GDP per hour worked. But GDP as it is traditionally measured may miss many of the benefits of an increasingly digital economy that creates free goods and makes them more widely available while also improving the quality, variety, or convenience of existing goods. An alternative metric, GDP-B, seeks to address these challenges by assessing the benefits of goods and services, not the amount spent.⁶

Furthermore, general purpose technologies like AI are likely to experience a lag between their initial adoption and observable improvements in productivity. In a second study, Brynjolfsson, Rock, and Syverson model this “Productivity J-Curve.”⁷ As with other types of information technology, the initial phase of AI adoption is characterized by time-consuming complementary investments, including the realignment of business processes, the integration of new technologies into existing workflows, and the upskilling of the workforce. As noted by Brynjolfsson and Lorin Hitt, these adjustments are costly and may create valuable intangible assets, but neither the costs nor the benefits are typically accounted for when measuring a firm’s output.⁸ As a result, productivity as it is conventionally measured may initially be seen as stagnating or even falling. However, as these technological and organizational complements are gradually implemented, the productivity benefits of AI begin to materialize, marked by an upward trajectory in the J-curve.

The Productivity J-Curve model implies that productivity metrics fail to capture the full extent of benefits during the initial stages of AI adoption, leading to underestimation of AI’s potential.

The ultimate economic effects of generative AI will depend not only upon how much it boosts productivity and changes work in specific cases, but also on how much of the economy it is likely to affect. As noted by Daron Acemoglu and Autor, occupations can be broken down into specific tasks.⁹ Applying this insight, Brynjolfsson, Tom Mitchell, and Rock look at 18,156 tasks in the O-NET taxonomy and find that most occupations include at least some tasks that could be automated or augmented by machine learning, though significant redesign would typically be required to realize the full potential of the technology.¹⁰ Building on this work, Tyna Eloundou, Sam Manning, Pamela Mishkin, and Rock estimate that approximately 80 percent of the US workforce could have at least 10 percent of their work tasks either automated or augmented by the introduction of LLMs, while around 19 percent of workers could see at least half of their tasks affected.¹¹

Hulten’s theorem states that a first-order approximation of the productivity effects of a technology is the share of the economy affected multiplied by its average productivity impact. There is evidence that both the potential productivity impact and the potential share of the economy affected are significant in the case of generative AI, suggesting that the ultimate effects may be substantial, though, as implied by the Productivity J-Curve, they may take some time to be realized.¹²

The field of economics itself is not immune to the effects of generative AI. Students of economics are using the tools to help with their assignments, requiring a rethinking of teaching methods. We and our colleagues are using the tools to help with research and writing; we used LLMs to help with aspects of the preparation of this article. Anton Korinek described six ways that LLMs can assist economists: ideation and feedback, writing, background research, data analysis, coding, and mathematical derivations.¹³ Jens Ludwig and Sendhil Mullainathan go further, showing that AI models can be used to make the first stage of the scientific process — hypothesis generation — more systematic.¹⁴

This figure is a line graph titled, Productivity Mismeasurement J-Curve. The line graph illustrates the concept of the "Productivity Mismeasurement J-Curve" in relation to the adoption of Artificial Intelligence (AI) technologies. The horizontal axis represents the number of years since AI adoption, ranging from 0 to 40 years. The vertical axis represents the productivity growth mismeasurement, ranging from -1.75% to 0.25%. The graph shows a J-shaped curve that depicts how the mismeasurement of productivity growth changes over time following the adoption of AI. The curve starts at 0% mismeasurement at the time of AI adoption (year 0) and then rapidly declines, reaching its lowest point of approximately -1.75% around 5-10 years after adoption. After reaching the lowest point, the curve gradually rises, crossing the 0% mismeasurement line around 15 years after AI adoption. Beyond 15 years after adoption, the curve continues to rise slowly, reaching a small positive mismeasurement of about 0.125% at the 40-year mark. The source line reads, Source: Erik Brynjolfsson, Daniel Rock, and Chad Syverson. NBER Working Paper 25148, and published as "The Productivity J-Curve: How Intangibles Complement General Purpose Technologies," American Economic Journal: Macroeconomics, 13 (1), January 2021, pp. 333–72. — Figure 2

As discussed by Brynjolfsson and Gabriel Unger, important policy choices are emerging regarding AI’s effects on productivity, industrial concentration, and inequality.¹⁵ For instance, on the question of inequality, the distinction between technology used for automation versus augmentation or, more formally, AI that substitutes for rather than complements labor, can have significant effects on the distribution of income and bargaining power.¹⁶ Brynjolfsson has argued that either approach can boost productivity but has noted that a focus on human-like AI can lead to a “Turing Trap” by reducing worker bargaining power. As AI continues to grow in power, so too does the need for economic research to better understand how we can harness its benefits while mitigating its risks.

Endnotes

“Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence,” Noy S, Zhang W. Science 381(6654), July 2023, pp. 187–192. “Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality,” Dell’Acqua F, McFowland III E, Mollick E, Lifshitz-Assaf H, Kellogg KC, Rajendran S, Krayer L, Candelon F, Lakhani KR. Harvard Business School Working Paper No. 24-013, September 2023.

“Generative AI at Work,” Brynjolfsson E, Li D, Raymond LR. NBER Working Paper 31161, November 2023.

“Applying AI to Rebuild Middle Class Jobs,” Autor D. NBER Working Paper 32140. February 2024.

“Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum,” Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB, Faix DJ, Goodman AM, et. al. JAMA Internal Medicine 183(6), April 2023, pp. 589–596.

“Artificial Intelligence and the Modern Productivity Paradox: A Clash of Expectations and Statistics,” Brynjolfsson E, Rock D, Syverson C. NBER Working Paper 24001, November 2017.

“GDP-B: Accounting for the Value of New and Free Goods in the Digital Economy,” Brynjolfsson E, Collis A, Diewert WE, Eggers F, Fox KJ. NBER Working Paper 25695, March 2019.

“The Productivity J-Curve: How Intangibles Complement General Purpose Technologies,” Brynjolfsson E, Rock D, Syverson C. NBER Working Paper 25148, January 2020, and American Economic Journal: Macroeconomics 13(1), January 2021, pp. 333–372.

“Beyond Computation: Information Technology, Organizational Transformation and Business Performance,” Brynjolfsson E, Hitt LM. Journal of Economic Perspectives, 14(4), Fall 2000, pp. 23–48.

“Skills, Tasks and Technologies: Implications for Employment and Earnings,” Acemoglu D, Autor D. NBER Working Paper 16082, June 2010. Published as “Chapter 12 - Skills, Tasks and Technologies: Implications for Employment and Earnings” in Handbook of Labor Economics 4(B), 2011, pp. 1043–1171.

10.

“What Can Machines Learn, and What Does It Mean for Occupations and the Economy?” Brynjolfsson E, Mitchell T, Rock D. AEA Papers and Proceedings 108, May 2018, pp. 43–47.

11.

“GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models,” Eloundou T, Manning S, Mishkin P, Rock D. arXiv, August 2023.

12.

“Machines of Mind: The Case for an AI-Powered Productivity Boom,” Baily MN, Brynjolfsson E, Korinek A. Brookings Institution, May 10, 2023.

13.

“Generative AI for Economic Research: Use Cases and Implications for Economists,” Korinek A, Journal of Economic Literature 61(4), December 2023, pp. 1281–1317.

14.

“Machine Learning as a Tool for Hypothesis Generation,” Ludwig J, Mullainathan S. NBER Working Paper 31017, March 2023.

15.

“The Macroeconomics of Artificial Intelligence,” Brynjolfsson E, Unger G. International Monetary Fund, December 2023.

16.

“The Turing Trap: The Promise & Peril of Human-Like Artificial Intelligence,” Brynjolfsson E. Daedalus 151(2), Spring 2022, pp. 272–287. An earlier version of this argument was published as Race Against the Machine: How the Digital Revolution is Accelerating Innovation, Driving Productivity, and Irreversibly Transforming Employment and the Economy, Brynjolfsson E, McAfee A. Digital Frontier Press, 2011.

NBER periodicals and newsletters may be reproduced freely with appropriate attribution.

The Economics of Generative AI

Related

Researchers

Programs

Articles

More from NBER

Endnotes

More from the NBER