Using Public Data to Generate Industrial Classification Codes

John Cuffe; Sudip Bhattacharjee; Ugochukwu Etudo; Justin C. Smith; Nevada Basdeo; Nathaniel Burbank; Shawn R. Roberts

Using Public Data to Generate Industrial Classification Codes

John Cuffe, Sudip Bhattacharjee, Ugochukwu Etudo, Justin C. Smith, Nevada Basdeo, Nathaniel Burbank & Shawn R. Roberts

Published Date February 2022

ISBN 9780226801254

Big Data for Twenty-first-century Economics Statistics book cover

CONFERENCE HELD March 15-16, 2019

Book: Big Data for Twenty-First-Century Economic Statistics

Book editors: Katharine G. Abraham, Ron S. Jarmin, Brian Moyer & Matthew D. Shapiro

PUBLISHER: University of Chicago Press

Series: Studies in Income and Wealth

Statistical agencies face increasing costs, lower response rates, and increased demands for timely and accurate statistical data. These increased demands on agency resources reveal the need for alternative data sources, ideally data that is cheaper than current surveys and is available within a short time frame. Textual data available on public-facing websites present an ideal data source for certain US Census Bureau (henceforth Census) statistical products. In this paper, we identify such data sources and argue that these sources may be particularly well suited for classification tasks such as industrial or occupational coding. Using these sources of data provide the opportunity for statistical agencies to provide more accurate, more timely data for lower costs and lower respondent burden compared to traditional survey methods, while opening the door for new and innovative statistical products. In this paper, we explore how public data can improve the production of federal statistics, using the specific case of using website text and user reviews, gathered from Google Places API, to generate North American Industrial Classification System (NAICS) codes for approximately 120,000 single-unit employer establishments. Our approach shows that public data is a useful tool for generating NAICS codes. We also find challenges, and provide suggestions for agencies implementing such a system for production purposes.

Download Purchase Book

Any opinions and conclusions expressed herein are those of the authors and do not necessarily represent the views of the U.S. Census Bureau. All results have been reviewed to ensure that no confidential information is disclosed, DRB approval CBDRB-FY19-191. We thank Catherine Buffington, Lucia Foster, Javier Miranda, and the participants of the CRIW conference for their comments and suggestions. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research.
Copy Citation

John Cuffe, Sudip Bhattacharjee, Ugochukwu Etudo, Justin C. Smith, Nevada Basdeo, Nathaniel Burbank, and Shawn R. Roberts, Big Data for Twenty-First-Century Economic Statistics (University of Chicago Press, 2019), chap. 8, https://www.nber.org/books-and-chapters/big-data-twenty-first-century-economic-statistics/using-public-data-generate-industrial-classification-codes.

Download Citation

MARC RIS BibTeΧ

Using Public Data to Generate Industrial Classification Codes

Related

Topics

Projects

More from the NBER