Report:TotalPatent/Search Interface/The Search Forms/Semantic Search
From Intellogist
| Report | Patent Coverage Map | Ratings | Comments |
| This search system report was created by the Intellogist Team and is available for viewing only. If you'd like to share your knowledge on Intellogist, please visit the Best Practices, Glossary, or Community Reports pages. Registered users may be notified of any substantial changes to this report by placing a "watch" on the Revisions page, which is the last page listed in the table of contents. To learn more about using the Intellogist "watchlist," see the Watchlist Help page. | |
![]() ![]() |
|
Semantic Search
LexisNexis has used Pure Discovery, a Dallas, Texas based semantic technology company, to build a latent semantic analysis tool for use with TotalPatent and other LexisNexis products. To build the semantic relationships, the LexisNexis/PureDiscovery partnership has used "more than 10 million full-text patent documents from the U.S. Patent and Trademark Office's patent index, as well as Elsevier journal articles and other documents."[1]
The key feature of TotalPatent's semantic search engine is that it performs all semantic analysis functions prior to actually executing the search, and allows the user to view and control the search terms selected by the program. In other words, it takes the user's initial search terms, runs a semantic analysis to suggest up to 20 related terms (plus up to 30 related terms not included in the suggested query), and then allows the user to execute a weighted Boolean search based on those suggestions.
To initiate the semantic search process, keyword terms, phrases, sentences, or paragraphs must be input into the Semantic Search form. The input should describe the search subject matter. The limit is 32,000 characters. Other features of the semantic search interface include options that are also available from the Advanced Search form, including the ability to:
- Also search for terms in English machine translations.
- Remove family member duplicates.
- Input a publication date range.
- Apply other restrictions such as IPC class codes, assignee or inventor names, etc.
- Select which country collections to search.
- Select which document kinds to search (application, granted, or all).
- Define which fields will be displayed in the results list.
To initiate the search, two options are available. To review the keywords chosen by the semantic search engine, users would select the "Preview Results" option shown in the screenshot below. However, users do have the option to conduct the semantic search without reviewing or controlling the keywords chosen by the search program. Executing the search without prior review can be done by selecting the "Search Now" option on the search form.
If the "Preview Results" option is selected, the system will next produce a "query cloud" of search terms suggested by the semantic engine and a preview of the 20 most relevant results for the query produced by these semantic search terms. In order to suggest these terms, the LexisNexis/PureDiscovery semantic tool first analyzes the input to select the appropriate semantic "brain" to interpret the results. A "brain" is analogous to a particular knowledge domain built from the underlying body of documents processed by the semantic engine. The entire body of knowledge used to build the semantic engine includes about 10 million documents, including full text patent documents and Elsevier journal articles. Out of this dataset, nine brains have been formed, and new patent and non-patent literature documents are being continuously fed to these semantic brains.[2] As of March 2012, the semantic search brain "is now able to identify multiple concepts in a search query and return concept terms relevant to more than just one dominant concept."[3] According to a LexisNexis representative, the new brains are better at identifying the core concept from multiple concepts.[2]
The purpose of giving the semantic search engine multiple "brains" is to allow the engine to first determine what general knowledge area the query is related to, before proposing linguistically related terms. When these domains are too specific, relevant terms can be missed; however when they are too broadly defined, irrelevant terms are more likely to be pulled in. According to LexisNexis representatives, the original test version of the semantic engine had as many as 300 brains, but this was narrowed down to 19 for the final product.[4] As of August 2012, the number of semantic brains has been reduced to nine.[2]
When running an analysis, the engine selects the best brain to interpret the user's input, which in turn is used to extract other semantically relevant terms and concepts for inclusion in the query cloud it proposes to the user. Terms are divided into four main sections: required terms (combined using Boolean operator AND), optional terms (combined using Boolean operator OR), excluded terms (combined using Boolean Operator NOT), and other suggested terms (up to 30) included in the "Holding Area," which are not included in the search. Terms in the query cloud are also shown in different font sizes and colors to show the suggested relevance prescribed for them by the system (high relevancy (H), medium relevancy (M), or low relevancy (L)). Both the "Required" and "Optional" sections include the three relevance options. One term in the cloud will always be automatically set as a mandatory term; in other words, all documents retrieved by the query must contain this term in order to be selected as a search result. The rest of the terms will be added into the search query using the Boolean OR operator (unless you manually add additional required terms), which means only one other term from the query cloud must appear in a document in order for it to be selected as a search result. In other words, the query suggested by the system will be in the following format:
- "Required term 1" AND (suggested term 2 OR term 3 or term 4… or term 20)
or, in the example below:
- ("mechanical heart valve"[H]) AND (valve[H] OR "mechanical heart"[H] OR prosthesis[M] OR bileaflet[M]... prosthesis[L])
This automatic designation of one term as a required term is done to reduce computational strain on the LexisNexis servers. One required term is needed in order to conduct a search, but users can add additional required terms and change the relevance weight of the required terms.
LexisNexis uses a system of letters to designate a relevance weighting scheme, where the value High (H) is given high prominence in the search results, Medium (M) is given medium prominence in the search results, and Low (L) is given low prominence in the search results. Only required and optional terms are given relevance rankings.
The query cloud display allows users to drag and drop the terms into the four different sections (required, optional, excluded, or holding area) and change the relevance ranking of terms within the required and optional sections. Besides dragging and dropping the terms into the desired locations, users may also click on a term to change its status and relevance ranking (see the screenshot below). Up to 20 terms can be included in the required, optional, and excluded sections (which will be included in the query). Any additional terms not included in the query are listed in the holding area section, which by default lists up to 30 extra suggested terms. In addition, the user may add terms to the query cloud (they will be prompted to move another term to the holding area if 20 terms are already included in the query), and may then adjust the weighting and relevancy of the manually added term (which is by default added as a required, high relevancy term).
At the top the "Preview" window, the original search terms are displayed above the query cloud, and users can edit the original search terms and regenerate the query cloud. At the bottom of the preview window, the full list of required, optional, and excluded search terms are displayed, with relevancy ranking indicated beside each term. The additional search restrictions are also displayed below the semantic search terms (which can be edited by selecting the "Edit Restrictions" link). The option to save the search terms and semantic search terms is also displayed at the bottom of the page. Finally, to the right of the query cloud, users can view the 20 most relevant results based on the current query. Every time the query cloud is altered, users can select to reload the relevant result list. Select the "Retrieve all results" option to view the full result list for that semantic query.
After the weightings are specified to the user's satisfaction, and any additional desired terms are manually added, the query will be executed from this point forward as a straightforward weighted Boolean query. No further terms will be pulled into the query after this point; the user has complete control over the terms searched and their status and relevancy ranking from the query cloud screen.
After executing the search, the results are displayed in a standard TotalPatent hit list view.
Editor's Note:Semantic searching has long been promised to the patent search industry, and some products were already on the market before this release, including the PatentCafe semantic search product. A major criticism of these tools up to this point is that they really function as black boxes, and take away a necessary element of user control over both search precision and recall, making them suitable only to supplement a comprehensive patent search, but not to be its primary tools. LexisNexis has addressed this concern well by building in an element of complete user control over the query that gets executed, turning the semantic function into a query building assistant, rather than a black box search. In the TotalPatent tool, users can control the search strategy exactly, including adding and deleting terms and adjusting the status and relevance of suggested terms. This is definitely desirable, as semantic engine technology is still in its infancy, and machine intelligence can make mistakes. For example, in a query cloud created during s test search, the terms "prosthesis" and "prostheses" were both introduced, but weighted differently.
A direct comparison of semantic search tools is difficult without exact knowledge of their internal workings. However one easily understood factor is the number of documents used to serve as the semantic engine's body of underlying knowledge, or in other words, the training corpus used to "teach" the program about language relationships and word meanings. Elsevier's status as a publisher gives it a unique advantage in this respect: because it owns the underlying data, LexisNexis was able to use a unique corpus of over 10 million full text journal articles and patent documents to inform its semantic engine. It seems to the editor that other commercial search providers would not enjoy the same advantage as publishers because, although they own and produce massive databases, many of the scientific literature resources are not full text.
In addition, it may be worth examining the value of the semantic search engine in the context of TotalPatent's patent data coverage. The editor is not aware of other semantic search products on the market that are commonly applied to machine translated collections of non-English patent documents, although the error rate inherent to the semantic machine-learning process, when combined with the uncertain translation quality often present in machine translated collections, could make this a moot point.
Interestingly, some have suggested a further possible use for the TotalPatent semantic search: taking the suggested terms from a first analysis and using them as new input for a second one, thereby widening the semantic "net" and allowing users to see more diverse and potentially useful terms.
All new subscriptions to TotalPatent include semantic search capabilities free of charge. Without a subscription to the set of semantic tools, there is a cost associated with each use of the semantic search feature. See Pricing Policy for more information.
Semantic Ranking
In addition to introducing a semantic search interface, LexisNexis has developed another application for its new semantic technology: relevancy ranking of search results. To read more, see the Semantic Ranking section of this article.
Sources
- ↑ "LexisNexis Introduces Transparent Semantic Search Technology for Patent Research." Published October 12, 2009. LexisNexis website, http://www.lexisnexis.com/media/press-release.aspx?id=125674399689744. Access May 30, 2012.
- ↑ 2.0 2.1 2.2 Correspondence with LexisNexis representative via E-mail. Received July 11, 2012.
- ↑ "Enhancements to TotalPatent™." LexisNexis website, http://www.lexisnexis.com/total-patent-enhancements/. Accessed March 23, 2012.
- ↑ Presentation (via webcast) given by Peter Vanderheyden, Vice President for Intellectual Property at LexisNexis, to author. October 29, 2009.


