Skip to main content

These files contain all of the FOIA-disclosable data for active and deactivated providers in NPPES.

The source NPPES data dissemination is available from NPPES.

Older, such as pre-2011, copies of the NPPES data dissemintation gratefully accepted. Please contact data@nber.org. We do have some of these files here but they are not yet converted to .dta or .sas7bdat formats.

Jean Roth set up the processed versions of the source NPPES data dissemination to make the files easier to use. The SAS and Stata files are about 32 Gb. CMS releases the source NPPES data in a limited number of formats in order to keep costs low. The first line of the source file contains variable descriptions. These can be difficult for statistical packages to read because many stat packages have a 32-character maximum variable length. The source file has long descriptions with numbers at the end, so, "my_long_variable_description_number_1" to "my_long_variable_description_number_50" can all get cut off to "my_long_variable_description_num" .

In the SAS and Stata datasets, the header information is preserved as the variable label and variable names with less than 32 characters which preserve the sequence number, if applicable, are assigned.

The source data file has nearly 5 million records. Excel 2010 supports a maximum of 1,048,576 rows, so it cannot be used to read in the whole source file at once. Some instructions on reading the file into Access and selecting variables and rows are available.

The main NPI data file and core data file include ZIP Codes. A ZIP Code distance database is also available.

Updates and changes.

File downloads

  • Source Zip Files: Monthly, Weekly: Raw zip files fro 2017-2023
  • Full npi data set, and smaller reshaped into a database style files, separating repeated values from core variables. Not all files are available for all years. See bellow for types of files available.

Latest data: May 2023

The NPI data above has been reshaped into a database style below so that non-repeated fields are in the core file, and repeated files are in their own long, skinny files. The files can be linked by the NPI field. The core file is less than 1/5 the size of the full NPI data files above to make it easier to work with.

  Description  

  SAS  

  Stata  

  CSV  

  Desc  

  Core, Non-Repeated Variables   core (5Gb)   core (5Gb)   core (<2Gb)   desc
  Healthcare Provider Taxonomy Code Variables   ptax (.4Gb)   ptax (.4Gb)   ptax (.2Gb)   desc
  Provider License Variables   plic (.2Gb)   plic (.2Gb)   plic (.2Gb)   desc
  Other Provider Identifier Variables**   othp (.6Gb)   othp (.6Gb)   othp (.2Gb)   desc

** One limitation of these crosswalks is that the provider had to include the other provider identifier in their NPI application in order for that provider identifier to appear in the NPPES database.

The data is also available in two-variable files of NPI + one other database variable pairs for greater ease of use.

Contact data@nber.org with questions, comments, or suggestions.

Related

Topics

Data Categories

More from NBER

In addition to working papers, the NBER disseminates affiliates’ latest findings through a range of free periodicals — the NBER Reporter, the NBER Digest, the Bulletin on Retirement and Disability, the Bulletin on Health, and the Bulletin on Entrepreneurship — as well as online conference reports, video lectures, and interviews.

15th Annual Feldstein Lecture, Mario Draghi, "The Next Flight of the Bumblebee: The Path to Common Fiscal Policy in the Eurozone cover slide
  • Lecture
Dr. Mario Draghi, who served as President of the European Central Bank and Prime Minister of Italy, presented the 2023...
2023 Methods Lectures, Jesse Shapiro and Liyang (Sophie) Sun, "Linear Panel Event Studies" Primary tabs
  • Lecture
Overview: Linear panel event studies are increasingly used to estimate and plot causal effects of changes in policies...
2023, SI Economics of Social Security, Panel Discussion, "Long-Term Dynamics of the Employment-to-Population Ratio" Primary tabs
  • Lecture
Supported by the Alfred P. Sloan Foundation, the National Science Foundation, and the Lynde and Harry Bradley...