Report:Thomson Innovation/Search Syntax/Allowed Operators/Truncation or Wildcard Operators

From Intellogist

Jump to: navigation, search
  Report          
This search system report was created by the Intellogist Team and is available for viewing only. If you'd like to share your knowledge on Intellogist, please visit the Best Practices, Glossary, or Community Reports pages. If you are a registered user and would like to be notified of any substantial changes to this report, you may place a "watch" on the Revisions page, which is the last page listed on the table of contents. To learn more about using the Intellogist "watchlist," see the Watchlist Help page.

Truncation or Wildcard Operators

There are many truncation and wildcard operators available in Thomson Innovation. Like the proximity operators, some of these have differing functions in different collections. In addition, there is also a keyword weighting function available.[1][2][3]


Operator Basic Definition Use In Patent Collections Differences In Business Collections Differences In Literature Collections
? The question mark wildcard represents exactly one character. This character can be used internally for all collections (e.g. “t?re” will pick up “tyre” and “tire”), and multiple question marks may be used to represent multiple characters. Left, right, and internal use supported.
Only for patents, the question mark wildcard can be used in conjunction with the asterisk or a numeric to facilitate threshold (no less than and/or no more than) truncation.

ex 1: ???*oxide (three or more characters must precede “oxide”)
ex 2: ???*5oxide (no fewer than 3 and no more than 8 characters must precede “oxide”)

Right, internal Right, internal
* The asterisk wildcard represents zero or an unlimited number of characters.  The asterisk can also be used within a word. Left and/or right, internal
Only for patents, the question mark wildcard can be used in conjunction with the asterisk or a numeric to facilitate threshold (no less than and/or no more than) truncation. (See examples above)
Right, internal Right, internal
*n If the asterisk is followed by a numeric quantifier (*n), the quantifier indicates the maximum number of additional characters that the truncation should encompass. Left, right, internal Right, internal Not available
Stemming
(enabled from preferences page)
Stemming is a truncation operator that searches for different grammatical endings for a word based on a system dictionary. By default, stemming is Off. Change the default to On from your Search Preferences screen. Not available Not available
British/English Spellings Both British and English spelling variations are automatically searched in literature unless the word is enclosed in quotes. Not available Not available Automatically searched, unless word is in quotes.
Keyword Weighting [n] Users can assign relative importance ("weight") to their keywords by using this operator. The weight assignment is shown as a number between 01 and 100, where 01 represents the lowest importance rating and 100 represents the highest. Yes Not available Not available
{d} Stands for a digit (0-9); This term can be used in multiples: {d}{d} May only be used in Expert Search form. Not available Not available
{c} Stands for a consonant. This term can be used in multiples: {c}{c} May only be used in Expert Search form. Not available Not available
{v} Stands for a vowel; This term can be used in multiples: {v}{v} May only be used in Expert Search form. Not available Not available
{a} Stands for a letter (A-Z); This term can be used in multiples: {a}{a} May only be used in Expert Search form. Not available Not available
Note: When a wildcard is used in a search term, stemming is disabled for that term.


The truncation and wildcard operators ?, *, and *n may also be used in the Native Japanese patent collection.[2]


editors note iconEditor's Note:

The $ is not used in Thomson Innovation. According to the help file, if you use it, it is ignored and a search for car$on executes as a search for caron.[1]


Stemming

Stemming is a truncation feature that uses linguistic techniques to attempt to find only keyword hits that are actually related to the root word being searched. It is becoming more common in newer search engines, and can improve accuracy by weeding out unrelated keyword hits that result from broad truncation. In Thomson Innovation, stemming is not invoked by a special operator; it is a setting that must be enabled from the Search Preferences page (see Preferences for more information). By default, stemming is Off. Change the default to On from your Search Preferences screen. Delphion users should note that Stemming is On by default in Delphion but Off by default in Innovation.[1]

The following example was taken from the Innovation help guide to illustrate the consequences of relying on stemming as opposed to using wildcard/truncation operators:

Stemming expands a search to cover different variations of a word. This means when you search a word like prime, your result set will include words that share a root, or stem, with the word you searched."


Search Term Result Set Includes
prime prime, prime's, primed, primely, primely's, primeness, primeness's, primes, priming, priming's, primings
primate primate, primate's, primates
carbon carbon, carbon's, carbonate, carbonate's, carbonated, carbonates, carbonating, carbonation, carbonation's, carbonations, carbone, carbone's, carbones, carbonic, carbonization, carbonization's, carbonizations, carbonize, carbonized, carbonizer, carbonizer's, carbonizers, carbonizes, carbonizing, carbons


"Stemming is a linguistic process and your results will include linguistic expansions of the stem word. Use wildcards for a result set that includes all expansions of a stem or word. Stemming is not applied to any search term that includes a wildcard. Stemming is not applied to any search term enclosed in quotes.


editors note iconEditor's Note:

Like any search tool that aims to improve accurate retrieval, using stemming is a double-edged sword: it may eliminate many false hits, and accidentally eliminate a few good hits too. For example, there could be a case where a stemming dictionary fails to associate the word “leaflet” as being related to the word “leaf,” and therefore did not return a search hit for that keyword. It is often difficult to anticipate these kinds of cases, as the stemming search programs used in search engines were most likely designed by other companies, and licensed by the database producer. Although stemming can result in greater search recall, it also can increase the uncertainty about what search terms may have been left out of the strategy. However, despite the potential dangers in a limited number of cases, Innovation has increased the flexibility of its system by including this feature.


British/English Spellings (Literature Collection)

According to the Thomson Innovation help file, both British and English spelling variations are automatically searched in literature unless the word is enclosed in quotes. The help files don't specify that this feature is available in other collections, but the file does note that a single question mark easily compensates for differences between US and British spelling.[1]

Examples:

  • A search for (psychotic behavior) yields results including both behavior and behaviour
  • A search for (psychotic "behavior") only returns results for behavior


editors note iconEditor's Note:

Although stemming and British spelling variations are automatically searched on Thomson Innovation (depending on the collection), lemmatization (which finds variations of words like complex plurals (tooth/teeth), different verb forma or tenses (run/running/ran) and degrees of comparison (big/bigger/biggest)) is not available in Thomson Innovation.[1] Lemmatization is available on some large search platforms covering mainly non-patent literature content, like ProQuest Dialog.


Single Character Wildcards

In the table above, there are four wildcard terms that many patent searchers may not be familiar with. They are the series in brackets, which replace any digit, consonant, vowel, or alphabetical character, respectively. These terms perform a narrower version of the single character wildcard (?), and may be useful in situations where that term is too broad. These terms come with a caveat in the Innovation help file that they “may only be used with Expert patent search”; however, when tested, they were found to work from the fielded search form in both Innovation Professional and Express versions (as well as the Analyst version in later tests).

The terms will appear in the highlighting panel as entered in the search query, but will successfully search and highlight the desired keyword terms.


The consonant {c} operator was used in a search, and appears in the highlighting pane. This term successfully highlighted a desired variant of the searched-for keyword when used in the highlighting pane.


Limitations

Every search engine has limits to the enabled use of wildcards and truncation operators. Excessive use of these operators may result in the slow operator of the system. The system help file does not give any hard-and-fast rules for when a query will or will not be deemed too broad. Rather than defining these limitations up-front, Innovation will return an error message when queries are deemed too broad for the system to process.

It is evident from the system help file examples, however, that Innovation can handle some very broad search terms. For example, a full-text search on the term “PE*” may be rejected in the larger US granted file, but may be feasible in the smaller French Applications file. In addition, switching the search to a narrower field within the same file can also turn an invalid query into a successful one: “Qu*” in the US Granted full text fields may fail, whereas the same term may run to completion if it is searched only in the Inventor field of the US granted file.[1]

Internal truncation may also confound the system when it is used too broadly. The limits of this feature are again determined by factors like collection size, and which fields are chosen for the search (an inventor search encompasses much less data to be searched than a full text query). As an example, the system help file states that the terms “P*T” and “P????T” will fail if attempted as a full text search in the US granted file, while the term “P???T” will actually run to completion.[1] These examples are useful in that they give the user a general idea of what will and will not work when utilizing the truncation and wildcard tools in the system, while not actually limiting the user (or telling them that any particular term is impossible up front).


Keyword Weighting

Thomson Innovation supports keyword weighting, a way for searchers to assign a relative importance to each search term. This will affect the way the search results will be ranked in the results set.

When using keyword weighting, the weight value is a number between 01 and 100. Each search term must be assigned a relative weight. As an example, the Thomson Innovation help guide gives the following search string:

[100]TI=(toothbrush ADJ holder) OR [50]TI=(holder);

This search string would be used in a situation where patents on toothbrush holders are twice as important to the search as patents on just any kind of holder.

Users can choose to display the relevancy for each hit in a results set by enabling it in the Display and Sort Options menu, which is discussed further in The Hit List section of the report. Users can also set Relevancy to On in their search preferences.


Stopwords & Reserved Words

Stopwords
The Innovation help file provides the following information about words that are completely off-limits because they will cause an error (“Stopwords”), and words that may be searched, but must be specially treated because they have meaning as operators (“Reserved Words”).

Stopwords have been implemented for business searching -- but not for patent or literature searching. Stopwords are not permitted as query terms.

The stopwords for the business collection are:

(w) BY
(s) FOR
(n) FROM
(f) OF
(t) THE
(l) TO
AN WITH
AND  

As of August 2012, stopwords, with the exception of words used as operators (AND, OR, NOT, SAME, and NEAR), no longer need to be avoided or enclosed in quotes in the Literature collection. A full list of stopwords no longer blocked from queries in the Literature Collection can be found here.[4]


Reserved Words
Certain words are reserved for use as operators, and, if your search string includes a reserved word, it will be interpreted as an operator. Reserved words can be searched in the database, but must be entered within double quotes to distinguish them from operators.

Reserved words for patent and literature searching are: AND, NOT, OR, SAME, WITH, and NEAR.

Reserved words for business searching are: NOT, OR, SAME, and NEAR. AND and WITH are not included in this list because, in business searching, AND and WITH are stopwords and cannot be used at all.

To search for a reserved word per se, type the word in double quotes: e.g., "near". To search for the phrase 'near field', you should type the following: "NEAR" ADJ FIELD.[1]


editors note iconEditor's Note:

Stopwords are sometimes necessary to implement in a database because the words are so common that asking the database to search for them will require too many computational resources – with the advancement of search technology, however, this practice is becoming less common. In Innovation, stopwords are only in effect for the business collection, and as such they should not impact search efforts too frequently.

The use of “reserved words” is necessary because Innovation has chosen to use English words, instead of symbols, as operators. This is a drawback for users when their search objectives require them to use one of these words in a search, but fortunately if the searcher knows the system rules, it can be done successfully. Many search engines rely on non-word operators to get around this sometimes-encountered limitation; however, that makes the operators themselves less easy for users to remember, so there is a trade-off. In the end, the use of reserved words should not be a major disadvantage to most searchers.


Sources

  1. 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 "Search Fundamentals." Thomson Innovation website, http://www.thomsoninnovation.com/tip-innovation/support/help/search_fundamentals.htm. Accessed September 12, 2012.
  2. 2.0 2.1 "Native Japanese Patent Searching." http://www.thomsoninnovation.com/tip-innovation/support/help/search_njp.htm. Accessed September 12, 2012.
  3. "Patent Searching." Thomson Innovation website, http://www.thomsoninnovation.com/tip-innovation/support/help/searching_patents.htm. Accessed September 12, 2012.
  4. "Release Notes." Thomson Innovation website, http://www.thomsoninnovation.com/tip-innovation/support/help/release_notes.htm. Accessed September 12, 2012.
Patent search questions. Expert answers.  Brought to you by Landon IP
HOT Items

Intellogist is brought to you by the patent search experts at Landon IP.

Welcome to Intellogist!

To network with our international community of patent info pros, please create an account.

For a list of our current members, see our Community Page.