Skip to main content
Citation

Publications and research reports based on the IPUMS USA database must cite it appropriately. The citation should include the following:

Steven Ruggles, Matt A. Nelson, Matthew Sobek, Catherine A. Fitch, Ronald Goeken, J. David Hacker, Evan Roberts, and J. Robert Warren. IPUMS Ancestry Full Count Restricted Data: Version 4.0R [dataset]. Minneapolis, MN: IPUMS, 2024. https://doi.org/10.18128/D014.V4.0R

Documentation
  • The IPUMS website covers all the publicly available variables. The additional variables in the restricted-use file include:
    • namefrst: 16 character first name (and possibly middle initial)
    • namelast: 16 character last name
    • histid: 36 character person id for matching across IPUMS versions (but not census decades)
    • street: street address
File Structure

The original files are hierarchical, but we have created the dta etc files as rectangular person datasets. That is, the household record is appended to each person record. We also apply the scaling factors in the IPUMS supplied code, which should conform the data to the documentation. Value labels that are merely the ASCII expression of the numeric value are dropped. 

New Folder/File Structure

In early 2026 the folder structure of the restricted data was reorganized to reflect the data's census year and version number following IPUMS releases: /homes/data/census-ipums/YYYY/v#.# where YYYY is the year of the census, for example 1950, and v#.# is the version number, v2.2. There is also a latest folder that points to the last version and will be updated whenever a new version is added.

Additionally we now provide two versions of the processed output files, each containing different sets of variables: the primary file, and what we are calling the othervars file

  • The primary file, named using just the census year (e.g., 1950.csv, 1950.dta, etc.), contains:
    • all variables given descriptive names by IPUMS (e.g., statefip, age, sex, race, etc.)
    • variables needed to merge this file with the othervars file
  • The othervars file (e.g., 1950_othervars.csv, 1950_othervars.dta) contains:
    • variables not given a descriptive name by IPUMS (e.g., us1950b_0010, us1950b_1022, etc.)
    • variables needed to merge this file with the primary file.

Per IPUMS, it is best practice to use the histid variable to perform merges because it does not change over time.

You can refer to the data dictionary files in the `docs' directory for the year and version you are using to determine the contents of these generically named variables.

Previous versions (Pre-2025) of the census files can be found in the in the data_archive folder preserving the vYYYY format used in the past.

Detailed information on the folder and file reorganization, please refer to this document.

In /homes/data/census-ipums/research-projects you can find data and code of premade matches made available by researchers that might be useful for your project. There are considerable savings in time and resources in using a pre-made match. Please make sure to cite the authors if you use their work. 

  • Multigenerational Longitudinal Panel: Linked Census Data The IPUMS Multigenerational Longitudinal Panel (MLP) project links individuals' records between censuses from 1850 to 1950.
    • Steven Ruggles, Matt A. Nelson, Matthew Sobek, Catherine A. Fitch, Ronald Goeken, J. David Hacker, Evan Roberts, and J. Robert Warren. IPUMS Ancestry Full Count Data: Version 4.0 [dataset]. Minneapolis, MN: IPUMS, 2024. doi:10.18128/D014.V4.0R
  • Census Linking Project:  Princeton created a set of linked datasets between every historical Census pair using a variety of automated methods. Publications using data from the matches should cite the Census Linking Project as:
    • Ran Abramitzky, Leah Boustan, Katherine Eriksson, Santiago Pérez and Myera Rashid. Census Linking Project: Version 3.0 [dataset]. 2025. https://censuslinkingproject.org
  • Census Tree: This Census Tree is the largest-ever database of record links among the historical U.S. censuses, with over 700 million links for people living in the United States between 1850 and 1940. The Census Tree includes 314 million census-to-census links for women, and 41 million links for Black Americans. Please refer to https://www.censustree.org/overview on how to cite this data.
  • Location Project: Census Place Project crosswalks that link historical decennial census microdata from IPUMS (indexed by histid) with standardize locations (longitude/latitude pairs). For more details about the crosswalks, see https://ezrakarger.com/census_place_project.pdf.
    If you have any questions or concerns, email nenckap@miamioh.edu or karger@uchicago.edu.
  • RepresentativeCensusLinks: Contains crosswalks to link census records for men and women in the United States from 1850 to 1950, constructed by combining historical census records with Social Security Number (SSN) application data. The dataset enables tracking individuals across multiple censuses despite name changes (e.g., due to marriage) and is particularly valuable for including women in the study of intergenerational mobility. If you use this data, please cite:
    • Althoff, Lukas, Brookes Gray, Harriet, & Reichardt, Hugo (2024). America’s Rise in Human Capital Mobility. [Working Paper](https://lukasalthoff.gi thub.io/pdf/igm_mothers.pdf).

 

Please reach out to the Data Team if you wish to make your code/data available for other approved researchers or replication in the research_projects folder. 

 

Exporting Data

Exporting Data: It is sometimes possible to export data from the NBER for external processing. This is not about releasing data for public access, and in general, exporting string variables are not approved except in rare cases.