The USPTO receives millions of patent applications and supporting documents each year. During the application process, inventors sometimes submit documents using alternative versions of their names. As raised in the USPTO directors blog (dated September 08, 2021), this can limit the office’s ability to accurately certify the number of applications from a specific inventor and determine whether inventors are following application fee rules and regulations. Certifications submitted with errors often mean longer wait times for all applicants.
Prior to this sample USPTO dataset, identifying patent application discrepancies required manually reviewing millions of documents to match names and signatures. Patent documents come in different formats and languages and can contain multiple inventors for each application. Signatures exist in various locations within a document, making signature to applicant name matching challenging.
This sample USPTO research dataset provides images of signatures extracted from inventor oath documents. This data set could be used for validation of micro entity certifications or other research purposes. It includes 883,811 applications and oath document signature images. Its is 40.5 GB of total size and is broken as 8 zip files for the following Patent Application Series:
|Application # Series||Applications||Signature Counts|
Each of these zip files contain folders for each application number in a given series. The application folders contain the oath document identifier that includes the image(s) of the signature(s) as PNG, and JSON file that contains the application number, the inventor name(s), and confidence level of the signature extraction algorithm.
Fun Fact: This research data set includes a few celebrity signature images such as Elon Musk and Lori Greiner. See if you can identify the others!
Download (6.3 GB)