Report:QPAT/Viewing Results/Analyzing Results
|Report||Patent Coverage Map||Ratings||Comments|
|This search system report was created by the Intellogist Team and is available for viewing only. If you'd like to share your knowledge on Intellogist, please visit the Best Practices, Glossary, or Community Reports pages. Registered users may be notified of any substantial changes to this report by placing a "watch" on the Revisions page, which is the last page listed in the table of contents. To learn more about using the Intellogist "watchlist," see the Watchlist Help page.|| |
|As of January 1, 2013, both QPAT and PatentExaminer have been discontinued, and they have been replaced by the Orbit.com portal.|
There are three types of analysis associated with results sets in QPAT: the first is statistical analysis of results, performed from the hit list; the second is in-depth document content analysis, performed in Patent Examiner; and the third is statistical analysis performed via the Questel Analysis module in Patent Examiner, made available in a January 2008 release.
Statistical Analysis from a QPAT Hit List
The statistical analysis function available from the QPAT hit list is a tool intended to inform and direct the search effort as it progresses. The tool runs a random sample of up to 500 documents from the document list, and displays the top 15 most frequently occurring class or assignee terms from the data set. Users may then refine the search further with this information.
This tool can only be applied to FamPat data or single-file data. In other words, the results must be 1) a hit list from a FamPat bibliographic search, or 2) hits from single full text files (no cross-file or multi-file analysis is possible).
Once the search is run, the Analyze icon will appear in the hit list menu.
A maximum of seven options are included in the menu: analysis can be performed on Assignee, European (ECLA) class, US class, International class (IPC), cited patents, citing patents, and both cited and citing patents. Some options will not appear for some data files. For example, the three cited and citing patent options are only available when running a Citations search. Once the analysis function is complete, the system displays the top 15 terms from the data set. Below is an example of analysis by IPC classification.
From the analysis screen, the user may select any of the top 15 terms, and extend the search to include those classifications, or limit the results in the current hit list to only those from the selected classes. Alternatively, the user may cancel and return to the hit list after browsing the analysis results.
Note that the data is not cleaned before it is displayed. This means that for a classification search, like the one in the figure above, subclass designations are counted as equal to full IPC classes, and displayed accordingly. Also, associated data tags may also make it into the count (notice the term “linked” appears with 74 hits in the list from the figure).
This effect is particularly pronounced for assignee names, which are not standardized in the database; the result is that the same company name may appear in the hit list multiple times, with frequency counts split between the variations (see the figure below, where CIBA GEIGY is listed three times). Variations in assignee names could mean a well-represented company won’t appear in the statistical analysis, because too many name variations will split the frequency count into small fragments. The figure below shows the results of an assignee analysis.
After an assignee analysis is performed, the user may choose to limit the hit list to only documents from particular top-15 assignees. Unlike the class analysis, the search may not be automatically extended to include all documents belonging to those assignees.
This feature can be especially useful to users wishing to use a keyword search to identify classifications related to their search subject matter; instead of browsing hits one-by-one, users can see a summary of the most frequently occurring classes. The same goes for users wishing to identify corporate players established in a particular market (although due to the long lag time between filing and publication in some markets, it should not be used to try to identify up-and-comers). For more in-depth analysis, users with a subscription to Patent Examiner will want to save their data as a work file, and use the more advanced features found in that program (discussed further below).
Disadvantages of to the feature are 1) the pitfalls introduced by messy data, and 2) the limitations imposed by the FamPat file.
The messy data problem mainly affects the assignee analysis feature, and should really not be considered a fatal flaw, considering that every patent database will suffer from the lack of name standardization to some extent. This is not a major problem, especially since the analysis tool is meant to be a quick-and-dirty summary tool, used to get clues to direct the search effort.
The FamPat limitations come into play when working with full-text searches. The analysis tool is only available after references are “grouped” using FamPat, and this can only be done for results sets under 5,000. However, this limit is probably high enough that major obstacles to analysis will not usually arise from it.
Viewing Documents in Patent Examiner
QPAT’s saved results sets can only be viewed using the Patent Examiner interface, an in-depth content analysis tool that is designed for attorneys and/or corporate decision makers to review, analyze, rank, annotate and share search results. It is for use on complete results sets, after the searching phase of the project is finished; however, in contrast to post-processing tools offered by other search engines, Patent Examiner does not focus on high-altitude graphing or visualization tools (those are available from the Questel Analysis Module, discussed in the next section). Instead, the features of Patent Examiner are directed to in-depth patent content review.
Although it is a post-search analysis tool, patent searchers themselves may have use for Patent Examiner’s advanced highlighting, viewing and ranking features, so the tool bears at least a brief discussion here.
One of the advanced reading features offered by Patent Examiner is a side-by-side text vs. document image reading screen. This gives the user access to two vital information sources: the searchable electronic patent full text, and the complete document images. The separation of these two salient document parts has been a necessary evil to enable full text searching, but comprehension of patent content can be impossible without both resources (especially for mechanical searches). As seen in the figure below, patent images are loaded in Adobe for page-by-page viewing, while the full text is included in the browser window and can be highlighted in up to six colors, just as in QPAT’s interface. Complete FamPat family information is also displayed in the interface, but can be turned off to create more space in the full-text reading pane.
Other major features in Patent Examiner include the filtering, ranking, sorting, and even searching capabilities that can be applied to a saved results set.
User defined ranking in QPAT is performed from the split-screen window, where five empty star outlines are visible at the top of the page (see figure above). Selecting one of these stars will fill the column from (right to left), to indicate a 1 to 5 relevance score based on the user’s own ranking system. This score will also be shown from the list view, and references in a work file can be sorted or filtered by assigned relevance score.
Filtering will cause Patent Examiner to show only references from a work file that fulfill certain desired criteria. Filtering can take place via a number of parameters, including the user-defined rank. The figure below shows the menu of options.
Finally, searching within a results-set can be done from the main page, and can be limited by a number of parameters, including full text keyword searching. The figure below shows the search interface. As seen from the menu bar at the top of the interface, a data export function is available as well, just as in QPAT.
Although it requires a separate subscription, the features in Patent Examiner could make it well worth the price for some users. The biggest enticements for most users would be the features that make Patent Examiner unique: the side-by-side full text and image review screen, and the ability to search within results. These features are missing from many comparable search providers.
The split viewing screen and image flipping capability is also offered by the Examiner search tool EAST, but the tool is only available on site at the USPTO in Alexandria, Virginia. Fast image flipping is almost always cited as the reason many users prefer EAST to web-based tools. Patent Examiner offers the best reasonable approximation to this, although because it is web-based, the load times for images are still considerably slower than what EAST can offer.
The Questel Analysis Module in Patent Examiner
The Questel Analysis Module is opened by selecting the “Stats” option that appears in Patent Examiner’s upper menu, or by selecting the "Statistics" link from the Analyze dropdown. It opens in a separate window, and allows the user to choose a variety of pre-defined graphs for display.
General categories of the graphs available from the system include: Documents, Assignees, Inventors, Technologies, and Data Crossing. The graphic size may be changed from a range of 1200 px to 640px. The different kinds of graphs available from the analysis module are discussed below, in the context of these categories.
- Documents: This option presents three bar chart graphing options: Distribution by Date (either PD - Publication Date, or PR - Priority Date), Patent Country, and Originating Country are all presented by frequencies of occurrence, and the later two options are mapped by geographic location. Two line charts with a cloud feature are also available, showing a geographical distribution of patents: Patent Country Timeline and Originating Country Timeline. In June 2009, two further graphical options were available from the "Documents" menu: Patent kind codes, and Citations. The kind code graph is a pie chart displaying a breakdown of the results set by standard letter codes representing the document type. The "Citations graph" (not shown) displays patent family hits by those having the greatest number of citations to other patents.
The kind code analysis graph is a relatively unusual feature among automated analysis features. Users should be cautioned that the kind code system currently in place is somewhat standardized through the use of letter codes (e.g. A for published application, U for utility model, etc.), but is still relatively complex, and the meanings of these codes are dependent on the systems used by the national or regional authority that issued the document. QPAT developers were wise to simply produce the graph as representative of these letter codes, rather than trying to interpret the data further with more specific labels. However, those new to patent information may be a little confused by this graph.
- Assignees: The assignee menu offers the following graphical options: “Top assignees”, “Evolution”, “Appearance”, “Speeding up”, “Collaboration”, and “Grouping.”
- Top assignees: a sideways bar chart showing the number of documents associated with each assignee. Hyperlinked assignee names allow users to open up a list of the patents in question.
- Evolution: A layered line graph that shows patenting volume over time for each assignee.
- Appearance: A graph that shows the year that the first document was published for each assignee (or first “appearance”), graphed against the total number of documents in the data set for each assignee. The graph only shows this information assignees with more than 5 documents in the data set,
- Speeding up: Shows the patent activity of assignees within a given timeframe, to determine which are “speeding up” their patenting activity.
- Collaboration: The output of this options a “Cluster Graph,” or visual representation of the relationships between data points. Here, the graph shows assignees and their relationships to other companies and individuals in the data set. The graph can be limited by number of publications, or by number of co-occurrences, and by date range. Data points can even be moved by the user if they are clustered too tightly together. See the figure below for an example.
- Grouping: The output is a list of assignees, with the option to group or ungroup them according to the user’s preferences.
- Inventors: The options under this heading are “Top Inventors,” “Appearance,” and “Speeding Up,” which mirror the same options available under the Assignees’ menu.
- Technologies: This option allows users to explore the US, IPC or ECLA classification codes represented in the data set. The sub menus are: IPCs, IPC sub-classes, ECLA, and USC (US classes). Each of these sub menus offers the following graphical options: “Top” IPC/subclass/ELCA classes, a “tag cloud” graph, and “Evolution,” “Appearance,” and “Speeding up” graphs. Most of these options are similar to the ones discussed for the Assignee menu, with one exception: the “tag cloud” graph. This graph shows the various data points in a group, where the data point’s size shows its proportionate representation in the database. See the figure below for an example.
- Data Crossing: The graphs under this option allow users to analyze assignee data vs. inventor, IPC, ECLA, or US class data. The output is a “Cluster Graph,” or visual representation of the relationships between data points.
The analysis module itself seems to be of high quality in that the graphs it produces are all interactive. However the downside is the long loading time required for the program to be ready for analysis (it can take 5-10 minutes, even with a fast internet connection). Another missing feature is the ability to “clean” data points, for example, to tell the system to recognize that “Yokohama Rubber KK” and “Yokohama Rubber Co Ltd” are in fact the same entity (the figure of the assignee relationships in the section above contains points for each of these entities).
One further frustration for users is that an adequate help guide for the program, including definitions of some of its graphs, is lacking. For example, the meaning of graph options such as “evolution” or “speeding up” are not readily obvious to the user, and the actual data being graphed (and the purpose of the graph itself) may only become apparent after some scrutiny. This problem was somewhat ameliorated in June 2009 by the addition of help icons on the title bar of each graph, which link to brief descriptions of the analysis performed.