Report:Google Patent Search/Data Coverage/Patent Coverage/Full Text Coverage
|Report||Patent Coverage Map||Ratings||Comments|
|This search system report was created by the Intellogist Team and is available for viewing only. If you'd like to share your knowledge on Intellogist, please visit the Best Practices, Glossary, or Community Reports pages. If you are a registered user and would like to be notified of any substantial changes to this report, you may place a "watch" on the Revisions page, which is the last page listed on the table of contents. To learn more about using the Intellogist "watchlist," see the Watchlist Help page.|
Full Text Coverage
Google Patents’ coverage is restricted to US granted patents and published applications, excluding all foreign patents. The company has produced the electronic data themselves from USPTO records, scanning in paper copies of the documents using Optical Character Recognition (OCR) technology. By creating this database, Google was actually duplicating an earlier effort of Corporate Intelligence, Inc. to produce OCR US full text data back to 1836; the file produced by that effort is now hosted by the Thomson Reuters-owned services MicroPatent PatentWeb and Thomson Innovation. Other providers also beginning to offer full text US collections that extend earlier than 1971; In 2008, Questel introduced a US full text backfile that covers the collection back to 1920.
Although OCR technology must have come far since the Thomson Reuters file was first produced, the fact remains that the older source data remains very difficult to work with. Consequently, Google Patents contains many scanning errors, and the service has come under scrutiny for the quality and integrity of their OCR data.
Early complaints included many scanning errors in bibliographic data and full text. Examples of common OCR scanning errors included:
- titles that consist of all XXX’s,
- misspelled titles and partial titles,
- nonsensical disagreement between issue and filing dates (e.g. issue date will say 1836, while filing date says 1898)
The earlier Corporate Intelligence, Inc. project also saw many errors introduced into bibliographic records for their files. However, when MicroPatent acquired Corporate Intelligence, it decided to offer the full text records without re-constructed bibliographic data for these files, on the assumption that incorrect bibliographic data would be worse than no bibliographic data at all. Thus, the new Google Patent service provides a source of bibliographic data records where they were missing before, but the provided data is messy and probably has a very high error percentage.
Whether Google’s imperfect OCR bibliographic records are superior to no records at all remains debatable. This second source for the data remains a welcome addition to most patent searchers, especially because it is free.
Another detail that could impact searches in the Google patent collection is that its US and IPC classification data only includes data published on the patent face at time of issue. Although the system refers to US classification data as “current,” an investigation into the file shows that this data was probably obtained just by scanning the patent itself, and no subsequent efforts by the USPTO to reclassify these patents will be reflected in the database. The same can be assumed for the international classification data in the system.
This statement is based on empirical investigations, rather than anything communicated in the Google help file. As an example, take the case of US 4,962,997. This patent was issued in 1990, and classified under 350/172 and 350/330. Below is a figure of the patent face, showing US classification at time of issue.
Searching for this patent in the USPTO database reveals the electronic classification records have changed, to state that the patent is now under 349/8 and 359/618. Below is a figure showing the current electronic classification record as hosted by the USPTO.
However, the Google Patents record does not reflect this change. It still shows the original classification data.
Although Google provides a link to the USPTO record, which the searcher might use to confirm the current details of this patent, the fact remains that this patent would not be retrieved by a Google Patents classification search in the new, updated classes.
The lack of updated US and IPC classification data is a particular problem for advanced patent searches, as both of these classification systems have gone through multiple revisions. Especially when combined with the upper limits on search hit retrieval (discussed in the Viewing Full Text and Images section), this limitation makes Google Patent Search a very weak tool for any kind of classification searching.
X-Patents in Google Patent Search
Finally, it is worth noting that the Google collection of pre-1836 patent documents (also known as X-patents) consists only of document images at this time, without associated bibliographic data. The search shown below was conducted for patent documents issued between the years 1776 (the earliest available via Advanced search drop-down menu) and 1800. The search returned 4 results, none legitimately published in the requested time period.
An X document was only succesfully retrieved from Google Patent Search after entering its document number. The image below shows an image of a US X patent as retrieved by Google Patent Search. Selecting the "About This Patent" tab or any of the navigation links that appear in the right panel are not enabled, because there is no electronic bibliographic or textual data available for this record.
Because it does not allow retrieval by any other method, Google Patent Search does not offer any additional functionality over the X-patent collection hosted by the USPTO.
- ↑ Search was conducted via the Advanced search form on November 4th, 2008. Search parameters were documents published from 1776 to 1800, and are shown in the figure. A secondary search was also conducted by selecting "X-patents" from the Advanced Search Form, and using the same date range.
- ↑ USPTO X Patent Search Site, http://patft.uspto.gov/netacgi/nph-Parser?TERM1=x+&Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2Fsrchnum.htm&r=0&f=S&l=50. Accessed on November 10, 2008.