Report:Delphion/Viewing Results/Analyzing Results
|Report||Patent Coverage Map||Ratings||Comments|
|This search system report was created by the Intellogist Team and is available for viewing only. If you'd like to share your knowledge on Intellogist, please visit the Best Practices, Glossary, or Community Reports pages. Registered users may be notified of any substantial changes to this report by placing a "watch" on the Revisions page, which is the last page listed in the table of contents. To learn more about using the Intellogist "watchlist," see the Watchlist Help page.|| |
|DWPI on Delphion is no longer available, as of March 31, 2012. DWPI data is available on the Thomson Innovation platform.|
Working with Results Sets
Delphion offers several data analysis functions for the temporary manipulation of results sets: Snapshot analysis, which offers a statistical analysis of various bibliographic data fields within a document set, and Clustering, which is a more unusual tool designed to perform linguistic (extracted keyword) analysis to show relationships between patent content. These two features are described in more detail in the sections that follow.
The Snapshot feature is ultimately a tool to “summarize” statistical data about the results set (which can be either an unsaved hit list, or saved work file). The tool produces information about the dataset via number/percentage occurrence statistics, as well as bar charts showing the information graphically. It is accessed from the “Snapshot” tab at the top of the screen when viewing a hit list or work file. The analysis can be performed on either the first 500 hits, or up to 20,000 results from the hit list.
Upon opening the Snapshot tab, default selections appear in the tab menu. The default settings will generate a four-way split summary window of Assignee, Inventor, Publication Year, and IPC-7 class. After the report is run, the top data values (by occurrence) are listed, along with actual number of occurrences, percentage of total dataset, and a graphical bar value showing the relative quantity compared to other rows in the chart.
After user-selections are made, the large “Summarize” button will run the program. Below is an example of the system’s output when the default settings remain selected. Only the assignee analysis is visible below. A graphical bar chart appears to the right of each line, showing the relative frequency of occurrence for each data point.
After running the Snapshot, “minimum number of occurrences” and “maximum rows shown” can be set by the user. The default values are minimum number of occurrences = 2 and maximum rows = 10. This means that data points will only be included in the charts if they occur 2 or more times in the dataset, and that only the top 10 assignees/inventors/classifications/etc. will be shown in the charts.
Individual data fields that can be summarized using the Snapshot feature include:
- Default 4 (Assignee, Inventor, Publication Year, IPC-7)
- Assignee City
- Assignee State
- Assignee Country
- Designated Country
- Application Year
- Application Year/Month
- Inventor City
- Inventor State
- Inventor Country
- IPC-R Code – 4 Digit
- IPC-R Code – full
- IPC 1-7 Code – 4 Digit
- IPC 1-7 Code - full
- Publication Year
- Publication Year/Month
- Priority Year
- Priority Year/Month
- US Assignee Code
- US Class – 3 Digit
- US Class – full
- US Examiner
- US Maint. Status
- US References – all
- US Forward Refs – all
- Unified Company
- Parent Company
- Ultimate Company
- Derwent Assignee Code
- Derwent Inventor
- Derwent Class – main
- Derwent Class – all
- Derwent Manual Code
- Derwent Update
Once the snapshot analysis is run, the program gives the user an opportunity to “drill-down” into the data by choosing only the data points of most interest (for example, choosing the top three company names by ticking their checkboxes); the program will then select only that particular subset of data, and display only those records for review under the Current Results tab. This new data subset may be saved to a work file for later review, or a second round of manipulation can begin on it (e.g. more snapshots, clustering, data extract, PDF/file history order, etc.).
In Snapshot, users can choose from some unusual data fields for statistical analysis, such as “assignee city,” for example. This tool is notable because of the wide range of data analysis features it provides to the user. In contrast, some competitors restrict statistical analysis tools to major data fields like assignee, inventor, and classifications.
Clustering is a keyword analysis feature intended to organize documents into related “clusters” based on extracted keywords from document titles and abstracts. It can be performed on either the first 500 hits, or up to 20,000 references in a dataset. Clustering analysis is performed from the “Clustering” tab at the hit list (or work file) view.
Once clusters have been calculated, a hyperlinked list will appear showing number of occurrences for a particular group of keywords.
Clustering works by assigning each document from the results set into “one and only one cluster,” defined by shared keywords that “characterize the cluster,” and the number of patent documents in the cluster itself. As seen in the figure above, a cluster is based on keywords that do not necessarily have any relation to the keyword terms in the search string. Clicking any hyperlinked group number from the cluster list (shown in the figures above) will display a list of newly grouped patent numbers. Supposedly, “drilling-down” into this group of documents by exploring them individually should expose their relationship with each other to the reader.
Visual analysis of the clusters is accessed through the Visual Map link. Once generated, the clusters can be organized (and re-organized) on this map based on the number of documents shared between them. The size of the clusters can be increased to show more of the underlying keywords that make up the clusters, and the font size can be adjusted accordingly. When viewing the map, the clusters are arranged so that more closely related groups (based on keyword content) should appear within a closer distance to one another. Selecting the “start” button on the menu bar will allow the clusters to spread out in relation to one another: unrelated clusters should increase their distance from each other. The “link values” shown in the figure below represent the “similarity percentage” between two clusters.
The clustering feature was once unique to Delphion, although such keyword extraction analysis features are becoming more common in search tools. Word association is also the basis for other advanced search techniques such as Latent Semantic Analysis (LSA), which can be found in an emerging set of patent search systems.
In everyday legal patent searching, Delphion's clustering feature may not provide much use. However, it might be useful when applied to certain datasets, such as a patent portfolio owned by one particular inventor or assignee; in that case, the feature might allow users to investigate technological diversity within an IP portfolio. Including an analysis feature which is not relevant to legal-type patent searching, but has this kind of competitive intelligence application, is yet another instance of Delphion’s catering to the business management side of patents, as opposed to the prior art search side.
Delphion’s help guide also notes that because Derwent titles and abstracts are re-written using industry-specific, standard terminology, using Clustering on Derwent records is especially meaningful. This is due to the wide range of different language and terminology that can sometimes be found describing similar inventions in separate documents. However, given the price of a single Derwent record, using this data in clustering can also be costly.