NATIONAL BUREAU OF ECONOMIC RESEARCH
NATIONAL BUREAU OF ECONOMIC RESEARCH

CMS' National Plan and Provider Enumeration System (NPPES) Files in SAS, Stata, and CSV format

Centers for Medicare & Medicaid Services CMS has developed the National Plan and Provider Enumeration System (NPPES) to assign unique identifiers to health care providers. The National Provider Indentifier (NPI) has been the standard identifier for health care providers since May 2007.

These files contain all of the FOIA-disclosable data for active and deactivated providers in NPPES.

The raw NPPES data dissemination is avaiable from NPPES.

Jean Roth set up the processed versions of the raw NPPES data dissemination to make the files easier to use. The SAS and Stata files are about 24 Gb. CMS releases the raw NPPES data in a limited number of formats in order to keep costs low. The first line of the raw file contains variable descriptions. These can be difficult for statistical packages to read for the following reasons:

1) Many stat packages have a 32-character maximum variable length. The raw file has long descriptions with numbers at the end, so, "my_long_variable_description_number_1" to "my_long_variable_description_number_50" can all get cut off to "my_long_variable_description_num" .

2) Stat packages typically require variable names to use the characters [_A-Za-z0-9] only. Some of the long descriptions contain other characters such as "(" , ")" , and "."  .

In the SAS and Stata datasets, the header information is preserved as the variable label and variable names with less than 32 characters which preserve the sequence number, if applicable, are assigned.

An NPI to UPIN crosswalk and an NPI to state license crosswalk made from these files are also available.

The raw data file has over 3.8 million records. Excel 2010 supports a maximum of 1,048,576 rows, so it cannot be used to read in the whole raw file at once. Some instructions on reading the file into Access and selecting variables and rows are available.

The main NPI data file and core data file includes ZIP Codes. A ZIP Code distance database is also available.

Updates and changes.

  SAS  
  Stata  
  CSV  
  Raw  
  Documentation  
  Code Values  
  Application Form & Instructions  
  SAS   Stata   CSV   Raw   Documentation   Code Values
cms10114.pdf

The data above has been reshaped into a database style below so that non-repeated fields are in the core file, and repeated files are in their own long, skinny files. The files can be linked by the NPI field. The core file is less than 1/5 the size of the NPI database above, so it can be easier to work with.

  Description  
  SAS  
  Stata  
  CSV  
  Core, Non-Repeated Variables   core   core   core
  Healthcare Provider Taxonomy Code Variables   ptaxcode   ptaxcode   ptaxcode
  Provider License Variables   plicnum   plicnum   plicnum
  Other Provider Identifier Variables   othpid   othpid   othpid

To report problems or if you have comments or suggestions , e-mail Jean Roth at jroth@nber.org

Last Update Created by Jean Roth February 2, 2012
 
Publications
Activities
Meetings
Data
People
About

Support
National Bureau of Economic Research, 1050 Massachusetts Ave., Cambridge, MA 02138; 617-868-3900; email: info@nber.org

Contact Us