Report:VantagePoint/Text Mining and Clustering/Mapping Document Clusters
|Report||Patent Coverage Map||Ratings||Comments|
|This report was created by the Intellogist Team and is available for viewing only. If you'd like to share your knowledge on Intellogist, please visit the Best Practices, Glossary, or Community Reports pages. If you are a registered user and would like to be notified of any substantial changes to this report, you may place a "watch" on the Revisions page, which is the last page listed on the table of contents. To learn more about using the Intellogist "watchlist," see the Watchlist Help page.|
Mapping Document Clusters
VantagePoint has three main types of maps, cross-correlation, auto-correlation, and factor. They are accessed via Sheet --> Add Maps. Users should note that it is often useful to create smaller groups within a field before mapping. This not only reduces processing time, but allows users to focus on only the important records within a field. For example, before mapping, users could create a top 20 group within the inventor field to capture only those who have made significant contributions in the art. The mapping wizard, which can help users create three types of maps, is shown below.
Cross-correlation maps show the relationship between two separate fields. An example would be the relationship between patent assignees and inventors. To create the map, users select the two fields they wish to use. The screen for Step 1 is shown below, and step 2 uses exactly the same screen, which allows users to choose the second field in the correlation.
The screenshot below shows the relationship between a patent assignee and inventor. A solid blue line denotes a strong relationship between the two fields, while a dotted blue line is a weak relationship. In the example, one can see that Zexel Corp has a strong relationship with Mazda Motors. From this a likely conclusion could be that there is a strategic partnership between the two companies, or that one company is a subsidiary of the other. In reality, Zexel is a supplier to Mazda, therefore VantagePoint was able to correctly discern a relationship between the two companies. However, although VantagePoint can correctly highlight a relationship exists, it cannot determine the exact relationship; only outside research can confirm the true nature of the relationship.
Auto-correlation maps show the relationships among a single field. For example, they are often used to map relationships among the inventors listed on patent data, to illuminate inventor partnerships. This map is created easily by selecting the field to be mapped, as shown.
One can make several assumptions from the auto-correlation map displayed below. The first is the group of inventors shown likely work together at the same company. The second is that this group works closely together, either as a team or within departments. The first assumption is validated by looking at the Patent Assignee window in the top right, Denso Corp is shown as the top assignee. The second assumption is harder to prove, but the presence of mild to strong connections between the inventors probably means they are working together frequently enough to not just be coincidentally matched together. The auto-correlation map is extremely useful for determining this type of connection.
A factor map is the result of a Principal Components Analysis (PCA), which determines how frequently items occur together in a dataset. This could also be considered a visual representation of clustering data. The factor map is created by selecting the field to use and the number of factors to consider in the analysis.
The figure below shows the cluster of 4-digit international (IPC) classifications. One can see on the right side of the screen there are several classifications with many items, such as class F21S, which has 41 items. However looking at the factor map, it is apparent that most classes have a low correlation between one another. One could make the assumption that this particular dataset has several distinct categories of invention that are only slightly overlapping. In fact, the dataset contains patents found using the keywords solar radiation. The search system might find related but distinct technology areas, such as both solar energy collectors and mechanical devices to hold said collectors. The factor map is useful confirming or disproving assumptions made about the data.
Difference between an Auto-Correlation Map and a Factor Map
Auto-correlation maps and factor maps are related in that they both try to reflect the similarity between nodes in a map (for example, both types of map can show groups of inventors working together). However, there is a difference between the two types of map. The Factor maps function determines groups of items in the dataset which are similar, while the auto-correlation maps function calculates a percentage of correlation between individual items within the dataset.
The VantagePoint help manual clarifies the definition of a factor map further. From the help manual,
- "A Factor Map is a graphical representation of the results of a Principal Components Analysis (PCA). The PCA finds the list items that frequently occur together in the dataset."
In essence, factor maps show the groupings that are important, while auto-correlation maps show the similarity between items, from which the user can determine the important groupings.