Report:PatBase/Search Interface/The Search Forms/Non Latin Text Search
From Intellogist
| Report | Patent Coverage Map | Ratings | Comments |
| This search system report was created by the Intellogist Team and is available for viewing only. If you'd like to share your knowledge on Intellogist, please visit the Best Practices, Glossary, or Community Reports pages. Registered users may be notified of any substantial changes to this report by placing a "watch" on the Revisions page, which is the last page listed in the table of contents. To learn more about using the Intellogist "watchlist," see the Watchlist Help page. | |
![]() ![]() |
|
Non-Latin Text Search
In October of 2008, PatBase introduced a search form that allows users to conduct searches within the system’s non-Latin text content. To learn more about this data content, see non-Latin text in the Data Coverage section of this article.
The non-Latin search form will allow users to search PatBase's Chinese, Japanese, Korean, or Russian-language records (including PCT records for which PatBase has loaded original language text). Additionally, Latin-text terms, such as acronyms, can be used in the search form; this approach will search only the non-Latin records for the presence of these terms.
The figure below shows the search form. At the far right hand side of the page, a quick-reference menu shows the original-language data coverage for each option.
The text box can be used to construct queries using most available boolean and proximity operators in PatBase. Truncation and wildcard operators should work correctly in the Russian language, although they are not needed for Asian languages: each individual Chinese, Japanese, and Korean character is indexed separately, not in the context of larger words (since these languages are written without spaces between words).[1]
For each language, users can choose to restrict text queries to specific portions of the documents, such as "claims, title & abstract" or "description," as in the structured search form. However, in the Japanese interface, two additional fields become available: users can search by Japanese-language assignee name and inventor name.
After the search has been successfully executed, the search history entry will be displayed with a two letter code in brackets in front of the search line. This code indicates that the search was conducted from the non-Latin search form, and indicates which language the search was conducted in.
Note that this search string can now be combined on the history page with any other search string, including English-language text strings, class searches, etc.
For information about how to view hits found by a non-Latin text query, see the Viewing Non-Latin Text section in this article.
Editor's Note:Immediately after the launch of the non-Latin search interface, a bug was noted in the non-Latin text search feature. This bug was still present December of 2009, as verified by Intellogist editors during full review of the system. It seems that a saved non-Latin keyword search can be executed again successfully if it was saved as an individual search, but that re-loading a search history that contains non-Latin queries will sometimes fail. See Managing Search Histories for an example of a non-Latin query failing to re-load in a search history.
Editor's Note:PatBase's addition of searchable non-Latin text collections is probably the most extensive in the industry. It shows that the company's underlying philosophy is not just to create pre-prepared English machine translation collections for keyword searching (such as Thomson Innovation and TotalPatent offer), but to make the original language text available to searchers. As a translation company, RWS Holdings, a partner in the creation of PatBase, likely knows that machine-translated documents are a poor substitute for the original text. Thus, they seem to see more value in making the original text keyword-searchable, and providing an on-the-fly translation option for searchers who must read the relevant documents in English.
The addition of this searchable data also shows the company's international focus. Patent searchers from various countries often must be proficient in English to use the value-added search tools available only in the English language, such as the Derwent World Patents Index, and CAplus. However, it is likely that these searchers would prefer to search and read text in their native languages where possible. By making the tool more palatable to Asian- and Russian-language speakers, Minesoft and RWS are expanding PatBase's market base.
Furthermore, the ability for users to combine non-Latin text searches with other types of searches from the central search history page is an attribute of PatBase is an extremely interesting feature. For example, this feature would allow users to search for records in which the Japanese-language family member has one set of keywords, and the English (or other language) member contains a different set of keywords. This is yet another example of an interesting feature of the system that is made possible by the aggregation of all family member data, including full text, into PatBase family records.
There are also some searching pitfalls that can be encountered when a searcher deals with collections in a foreign language. In this case, English language searchers must still be aware that these records do not already exist as English translations in the PatBase database – a searcher must first find the records, and then issue a command to create the English machine translation. This is different than the approach used in Thomson Innovation or TotalPatent, which contain pre-translated collections of English machine translations that are keyword-searchable in English: in contrast, PatBase machine translations are meant only to promote the comprehension of non-English texts that have already found by other means.
Sources
- ↑ Telephone conversation with PatBase Help Desk staff, October 27, 2008.


