These data are described in detail in

Hall, B. H., A. B. Jaffe, and M. Trajtenberg (2001). "The NBER Patent Citation Data File: Lessons, Insights and Methodological Tools." NBER Working Paper 8498.

 

ALL USERS OF THESE DATA SHOULD READ THIS PAPER, AND SHOULD CITE IT AS THE SOURCE OF THE DATA

Further documentation on uses of the patent citation data, including the methodology paper and a CD containing the complete dataset itself, is available in the book Patents, Citations and Innovations: A Window on the Knowledge Economy by Adam Jaffe and Manuel Trajtenberg, MIT Press, Cambridge (2002). The book may be ordered from MIT Press. ISBN 0-262-10095-9.

The CUSIP match is based on the 1989 universe of companies

UPDATES

The NBER is working on a major NSF-funded update and extension of this data. A new release of these files, bringing existing data up to date through December 2004, is anticipated for 2010 or 2011. A variety of additional fields and indexes will also be provided. These are anticipated to include "link-out" tables connecting patent numbers to geographic entities (e.g. SMSAs), and a codification of inventor names. The PI for this project is Iain Cockburn. Please contact him if you have questions or comments, or would like to contribute to this project.

Updates are available at https://sites.google.com/site/patentdataproject/Home.

The data are freely available below in two compressed (".zip") formats: SAS transport (.tpt) files and ASCII comma-separated variable (.csv) files. The program read_tpt.sas can be used to convert the .tpt files to native SAS data sets. Lines in the ASCII CSV files are terminated by the newline character "\n". "CSV" stands for comma separated values. All values in the ASCII CSV files are separated by commas. In addition, the character values are enclosed by double quotes. The compression ratio for the compressed files is about 75%. The ".zip" files can be uncompressed with winzip or pkunzip. To check your ability to uncompress these files, download the small file compress.zip. The SAS ".tpt" files are transferable to other formats using software such as Stat/Transfer or DBMS/Copy, and can be used directly by Stata using the fdause command. To download files in Internet Explorer, right click on them and select "Save Target As...". Internal users can access the data at /home/data/patents

You will need a major database, statistical program, or programming language to use these files. Most of the datasets are too large to load completely into MS Excel 2000, which has a maximum of 65,536 observations, though Access can be used to read the ASCII datafile. View variable descriptions and observations per file in the "Documentation" column of the table below. U.S. patent information can also be downloaded or purchased from the United States Patent and Trademark Office, which also has a U.S. to IPC concordance.

To search patents, try Google -> more -> patents or http://www.freepatentsonline.com

For international patent databases check FIZ Karlsruhe, the British Library (Derwent is one Patent Copy Service that delivers patents from the British Library.), the German Patent and Trade Mark Office, Espacenet, Micropat, the French Intellectual Property Institute, the IciMarques database, or the EP-CESPRI database, a database along the lines of the NBER dataset, but for European Patent Office data. Many of the sources above were obtained from InfoToday. Derwent has a searchable patent glossary and a link to a text patent glossary made by The Minerals, Metals & Materials Society. For principles and sources for patents searching see Free Pint articles by Ron Kamenicki and Stephen Adams.

More recent data can be obtained from the U.S. Patent Office's ftp site.

Updates and changes.

Description Documentation

Data -- Pkzipped

SAS .tpt ASCII CSV
Overview overview.txt

--

Pairwise citations data Cite75_99.txt Cite75_99.zip -- (68 Mb) acite75_99.zip -- (82 Mb)
Patent data, including constructed variables pat63_99.txt pat63_99.zip -- (90Mb) apat63_99.zip -- (56Mb)
Assignee names coname.txt coname.zip -- (2Mb) aconame.zip -- (2Mb)
Contains the match to CUSIP numbers match.txt match.zip -- (130Kb) amatch.zip -- (98Kb)
Individual inventor records inventor.txt inventor.zip -- (98Mb) ainventor.zip -- (82Mb)
Class codes with corresponding class names classes.txt

--

Country codes with corresponding country names countries.txt
Class, technological category, and technological subcategory crosswalk class_match.txt
Technological category and subcategory labels subcategory.txt

--

subcategory.csv
SAS program to convert .tpt files to native SAS format

--

read_tpt.sas

--

U.S. Patent Classification (USPC) System and the Standard Industrial Code (SIC) System

Send questions to data@nber.org

 

Last Update: 2012-05-16

Related

More from NBER

In addition to working papers, the NBER disseminates affiliates’ latest findings through a range of free periodicals — the NBER Reporter, the NBER Digest, the Bulletin on Retirement and Disability, and the Bulletin on Health — as well as online conference reports, video lectures, and interviews.

Economics of Digitization Figure 1
  • Article
The NBER Economics of Digitization Project, established in 2010 with support from the Alfred P. Sloan Foundation,...
claudiagoldinpromoimagelecture.png
  • Lecture
Claudia Goldin, the Henry Lee Professor of Economics at Harvard University and a past president of the American...
2020 Methods Lecture Promo Image
  • Lecture
The extent to which individual responses to household surveys are protected from discovery by outside parties depends...