Big Data in the U.S. Consumer Price Index: Experiences and Plans
This chapter is a preliminary draft unless otherwise noted. It may not have been subjected to the formal review process of the NBER. This page will be updated as the chapter is revised.
Chapter in forthcoming NBER book Big Data for 21st Century Economic Statistics, Katharine G. Abraham, Ron S. Jarmin, Brian Moyer, and Matthew D. Shapiro
The Bureau of Labor Statistics (BLS) has generally relied on its own sample surveys to collect the price and expenditure information necessary to produce the Consumer Price Index (CPI). The burgeoning availability of big data has created an opportunity for methodological improvements and cost savings in the CPI. The BLS has undertaken several pilot projects in an attempt to supplement or replace its traditional field collection of price data with alternative sources. In addition to cost reductions, these projects have demonstrated the potential to expand sample size, reduce respondent burden, obtain transaction prices more consistently, and improve price index estimation by incorporating real-time expenditure information—a foundational component of price index theory that has not been practical until now. The CPI uses the term alternative data to refer to any data not collected through traditional field collection procedures by CPI staff, including third party datasets, corporate data, and data collected through web scraping or retailer API’s. This paper reviews how the CPI program is adapting to work with alternative data, followed by discussion of the three main sources of alternative data under consideration by the CPI with a description of research and other steps taken to date for each source.