Stemming

From Intellogist

Jump to: navigation, search
This Glossary entry exists for the community to share information related to common terms used in prior art searching. Registered users can add, edit, or delete material on this page. Users should keep in mind that the information on this page is the result of community collaboration and, as such, is vetted by the community at large, not individual experts or fact-checkers. All information contributed to this page is public information - do not post confidential information. For more information about creating and editing Glossary articles, please see our Help pages. If you found this page through a web search, we invite you to visit our Main Page to see what Intellogist is all about.


Stemming is a search system feature that attempts to reduce a given search term to its most basic root word, or “stem.” In so doing, the system will return results that include variants on the original search term that the searcher may not have anticipated. Stemming capabilities are commonly accomplished by programming a search algorithm to use a dictionary of common English prefixes and suffixes.

As an example, if a search term was entered as “Encapsulated,” a stemming algorithm would automatically reduce the word to “Encapsul,” and then include variants such as “Encapsulation,” “Encapsulating,” and “Encapsulator” as suffix variants. Some stemming algorithms will also strip prefixes such as “en,” therefore returning additional terms such as "capsule" or even “microcapsule.”

Similar to truncation, stemming can decrease the chance of human error or oversight by relying on the search engine to expand the terms considered, including some that a searcher may have otherwise overlooked. Stemming also saves users time and effort by automatically searching related terms that would otherwise had to have been queried separately.

Although stemming can be used to retrieve a more targeted set of keyword results than truncation alone, it can be less reliable. One downside to the use of stemming operators is that it often requires the user to place implicit trust in the effectiveness of the stemming algorithm, without providing the ability to review the possible variants that were considered. For example, a stemming search on the word "drop" might erroneously exclude results containing the word "droplet" if the "–let" suffix is not recognized as common English suffix by the stemming algorithm. As a consequence, a researcher could miss relevant results by implicitly trusting that the term "droplet" was included in the stemmed search query.

Stemming can also fail when non-English search terms are used. Unless specifically designed to accommodate non-English queries, the term will not be properly truncated and the search will be incomplete or incorrect. A significant advantage is held by systems with multi-lingual stemmers capable of applying stemming rules to more than one language simultaneously.

Patent search questions. Expert answers.  Brought to you by Landon IP
HOT Items

Intellogist is brought to you by the patent search experts at Landon IP.

  • There is a new Community Report on Relecura.
  • A new Analytics module is available on PatBase!
  • There's a new System Report on PatSeer!
  • Patbase has announced new legal status and similarity search tools.
Welcome to Intellogist!

To network with our international community of patent info pros, please create an account.

For a list of our current members, see our Community Page.