Historical patent data files (stata (.dta) and MS excel (.csv))

Contains four research datasets containing time series and micro-level data by National Bureau of Economic Research (NBER) technology sub-category on applications, grants, and in-force patents spanning two centuries of innovation. For more information: https://www.uspto.gov/learning-and-resources/ip-policy/economic-research/research-datasets
Updated: 2015-06-25

  Download (105.14 KB)
Dates Available Jun 25, 2015 – Jun 25, 2015

Patent and patent application Claims data (Stata (.dta) and MS Excel (.csv))

Contains detailed information on claims from U.S. patents granted between January 1976 and December 2014 and U.S. patent applications published between March 15, 2001 and December 2014. The dataset is derived from the Patent Grant Full Text and Patent Application Full Text bulk data files. The Office of Chief Economist (OCE) applied a Python algorithm to identify individual claims as well as the dependency relationship between claims. From the parsed claims text, OCE created six data files containing individually-parsed claims, claim-level statistics, and document-level statistics, including newly-developed measures of patent scope.
Updated: 2016-10-11

  Download (9.32 GB)
Dates Available Oct 07, 2016 – Oct 11, 2016

Patent examination research dataset (stata (.dta) and MS excel (.csv))

Contains detailed information on more than 13 million publicly viewable patent applications filed with the USPTO along with more than 1 million PCT applications through June 2023. The data files include information on each application's characteristics, prosecution history, continuation history, claims of foreign priority, patent term adjustment history, publication history, and correspondence address information.
Updated: 2023-09-26

  Download (79.39 MB)
Dates Available Dec 02, 2015 – Sep 26, 2023

Cancer Moonshot data (MS excel (.csv))

This curated dataset consists of 269,353 patent documents (published patent applications and granted patents) spanning the 1976 to 2016 period and is intended to help identify promising R&D on the horizon in diagnostics, therapeutics, data analytics, and model biological systems.
Updated: 2016-08-19

  Download (18.8 MB)
Dates Available Aug 17, 2016 – Aug 19, 2016

Patent Litigation data (stata (.dta) and MS Excel (.csv))

Contains detailed U.S. District Courts patent litigation data on 81,350 unique court cases filed during the period 1963 - 2020. The data was collected from the Public Access to Court Electronic Records (PACER) and RECAP as sources for all of the content. The final output datasets, provided in five different files, include information on the litigating parties involved and their attorneys; the cause of action; the court location; important dates in the litigation history; and, covering over 5 million document level information from the docket reports, descriptions of all documents submitted in a given case.
Updated: 2024-03-27

  Download (467.26 MB)
Dates Available Dec 29, 2016 – Mar 27, 2024

Patent application Office actions data (stata (.dta) and MS Excel (.csv))

Contains detailed information on 4.4 million Office actions mailed from 2008 through June 2017 for 2.2 million publicly viewable patent applications. The data are sourced from the text of Non-Final Rejection and Final Rejection Office actions issued by patent examiners to applicants during the patent examination process. The data files include information on grounds for rejection raised, the claims in question, and pertinent prior art.
Updated: 2017-11-29

  Download (635.44 MB)
Dates Available Nov 29, 2017 – Nov 29, 2017

Patent and patent application Oath Signature data (JSON and PNG)

The USPTO receives millions of patent applications and supporting documents each year. During the application process, inventors sometimes submit documents using alternative versions of their names. As raised in the USPTO directors blog (dated September 08, 2021), this can limit the office’s ability to accurately certify the number of applications from a specific inventor and determine whether inventors are following application fee rules and regulations. Certifications submitted with errors often mean longer wait times for all applicants.

Prior to this sample USPTO dataset, identifying patent application discrepancies required manually reviewing millions of documents to match names and signatures. Patent documents come in different formats and languages and can contain multiple inventors for each application. Signatures exist in various locations within a document, making signature to applicant name matching challenging.

This sample USPTO research dataset provides images of signatures extracted from inventor oath documents.  This data set could be used for validation of micro entity certifications or other research purposes. It includes 883,811 applications and oath document signature images. Its is 40.5 GB of total size and is broken as 8 zip files for the following Patent Application Series:

Application # Series Applications Signature Counts
12 160,116 292,354
13 156,284 282,303
14 159,067 304,182
15 154,964 305,029
16 134,728 260,884
17 58,718 112,406
29 58,125 84,123
35 1,809 1,984
Total 883,811 1,643,265


Each of these zip files contain folders for each application number in a given series. The application folders contain the oath document identifier that includes the image(s) of the signature(s) as PNG, and JSON file that contains the application number, the inventor name(s), and confidence level of the signature extraction algorithm.

Fun Fact: This research data set includes a few celebrity signature images such as Elon Musk and Lori Greiner. See if you can identify the others!

Elon Musk  Lori Greiner

Updated: 2022-09-30

  Download (6.3 GB)
Dates Available Sep 30, 2022 – Sep 30, 2022