RIDIR: A Big Data Approach to Understanding American Growth
Project Outcomes Statement
Final Grant Report: Major Project Goals and Outcomes
We aim to enhance our understanding of America’s economic growth in the 19th and 20th centuries by taking a Big Data approach. This project represents a major effort to build comprehensive historical databases covering individuals, firms, and manufacturing plants in the United States. These databases include detailed geographic information on manufacturing activity, prices, and wages, and are designed to help us better understand the forces behind the nation’s rapid transformation into an industrial power. The project achieved its main goals through five major initiatives, while also fostering innovation in data science and supporting the development of early-career researchers.
Training and Broader Impacts
As part of our broader impact goals, the project has contributed to training the next generation of researchers. Several post-baccalaureate (post-bacc) students have pursued careers in related fields, with some earning NSF Graduate Research Fellowships. We have also developed several record-linking techniques and novel datasets using cutting-edge big data and machine learning methodologies. This knowledge has been widely shared through working papers, seminars, and conference presentations. Researchers will continue to disseminate these insights and materials through these channels and by curating content on the project website: https://timemachinedb.com.
1. Creation of a Comprehensive Demographic Database and Extensive Collaboration
We developed a large-scale database that traces individuals over time using complete-count U.S. Census records from 1850 to 1940. These records were meticulously linked to follow people across decades, enabling researchers to study migration, occupation, skills, and their relationship to regional economic growth. We also transcribed new archival data on county and city-level manufacturing activities that would enhance our understanding of labor market dynamics and America’s structural transformation. Co-PI Lee obtained Special Sworn Status with the U.S. Census Bureau, allowing access to restricted-use records via the Research Data Center.
As part of our focus on immigration, we linked approximately 11 million immigrant arrival records from the Port of New York and 6 million departure records from Europe to U.S. Census data, in collaboration with Professor Michael Peters at Yale University and Staatsarchiv Hamburg, Germany (Hamburg State Archives). These were further connected to U.S. patent records, producing a unique dataset that sheds light on immigrants’ contributions to innovation and economic transformation. By applying advanced spatial economic theory and big data techniques, we explored how immigrant location choices and skills influenced the nation’s economic growth.
2. Immigration and Economic Growth Analysis
Our immigration-focused research has provided new insights into how immigrant innovation, labor, and location choices shaped America’s transition from a rural to an industrial economy. These findings have been shared with the research community through seminars, conferences, and working papers, and will be made available to the public following publication via the project website and Yale Daily Newspaper.
3. Data Integration and Harmonization Techniques and Derivatives
To allow for meaningful comparisons across time and regions, we integrated and harmonized diverse datasets. This included the development of spatial and sectoral crosswalks that align county and state boundaries over time and standardize industrial classifications. For example, we used historical boundary shapefiles and converted industry codes to match both the Standard Industrial Classification (SIC) system and the 1950 Census Bureau categories. These harmonized datasets are being made available to both researchers and the public through the project’s website https://timemachinedb.com and the University of Michigan’s curated data repository (ICPSR).
4. Digitization of Historical Manufacturing Data to understand the Evolution of US Economy
We digitized a broad set of historical data on manufacturing activity at the city and county level from the 1860s through the 1950s. This includes data on wages, prices, capital investment, and workforce composition. The result is a rich, spatially detailed resource that enables new insights into how local and regional market forces influenced industrial development and economic change across more than a century.
5. Large-Scale Archival Data Transcription on America’s Manufacturing
We launched a major archival transcription effort to digitize plant-level data from the Census of Manufactures, beginning with records from the 1930s and 1950s. Using machine learning-based data capture technologies, this effort is part of a Joint Statistical Project with the U.S. Census Bureau (and with National Archives and Records Administration) and will expand to cover earlier decades in future phases. These records contain detailed information on plant-level revenue, employment, and capital, offering an unprecedented view into the structure and evolution of American manufacturing.
Investigators
Supported by the National Science Foundation grant #1831524
Related
Topics
Programs
More from NBER
In addition to working papers, the NBER disseminates affiliates’ latest findings through a range of free periodicals — the NBER Reporter, the NBER Digest, the Bulletin on Retirement and Disability, the Bulletin on Health, and the Bulletin on Entrepreneurship — as well as online conference reports, video lectures, and interviews.

- Feldstein Lecture
- Presenter: Cecilia E. Rouse

- Methods Lectures
- Presenter: Susan Athey