|This Glossary entry exists for the community to share information related to common terms used in prior art searching. Registered users can add, edit, or delete material on this page. Users should keep in mind that the information on this page is the result of community collaboration and, as such, is vetted by the community at large, not individual experts or fact-checkers. All information contributed to this page is public information - do not post confidential information. For more information about creating and editing Glossary articles, please see our Help pages. If you found this page through a web search, we invite you to visit our Main Page to see what Intellogist is all about.|
In the intellectual property world, patents are granted in many different countries and regions, each having slightly different legal conventions. To obtain legal protection for an invention in multiple countries, it is usually necessary to file multiple applications. For example, the same invention may be disclosed in patent applications filed in the US, Europe, and Japan. This phenomenon can cause duplication in worldwide patent searching: a searcher may have to review the same invention multiple times, when reviewing the patent literature for each country.
To effectively deal with this kind of duplication, data providers rely on the concept of a patent family to reduce multi-national results down to one representative member. Loosely defined, a patent family is a group of patent documents that are likely to share the same inventive concepts and technical details. To create these families, providers use the concept of shared priority data as a basis to assume that patent documents are technical equivalents.
There is no one accepted set of rules to define a patent family. Patent database producers are therefore able to apply various sets of rules to family construction, to suit their own convenience or the needs of their users. Accordingly, there are a number of different paradigms used in the industry to create family databases, but again, each is fundamentally based on the concept of shared priority data.
World Intellectual Property Organization (WIPO) Patent Family Definitions
The following six types of patent family are defined in the Glossary of Terms found in section 8.1.1 of the WIPO Handbook on Industrial Property Information and Documentation. This list should not be considered limiting, in that other types of patent families may be constructed from any other set of criteria designed by database providers. However, because they represent the major known family types, the WIPO definitions are reproduced here verbatim, in their entirety, with supporting family examples added by Intellogist editors.
WIPO Patent Family Definition
A collection of published patent documents relating to the same invention, or to several inventions sharing a common aspect, that are published at different times in the same country or published in different countries or regions. Each patent document in such a collection is normally based on the data for the application(s) on which the basis for its “priority right” has been claimed. Below follow the definitions for different types of patent families:
(1) Simple Patent Family
“Simple patent family” means a patent family relating to the same invention, each member of which has for the basis of its “priority right” exactly the same originating application or applications.
Example 1: ("D" equals any country code)
|Document Number||Priority Number|
In this case, a related document with multiple priorities is not allowed into the family, even though one of the priorities matches the family's original priority.
|D000014||P00001 and P00002 -> NOT PART OF THE ABOVE FAMILY|
(2) Complex Patent Familiy
- “Complex patent family” means a patent family relating to the same invention or to several inventions sharing a common aspect, each member of which has for the basis of its “priority right” at least one originating application in common with the other members of the family.
Example 2: ("D" equals any country code)
|Document Number||Priority Number|
|D000024||P00002 and P00003|
In this case, a related document with multiple priorities is not allowed into the family unless it shares priority number common to all the other members, in this case, P00002.
|D000025||P00003 and P00004 -> NOT PART OF THE ABOVE FAMILY|
(3) Extended Patent Family
“Extended patent family” means a patent family relating to one or more inventions, each member of which has for the basis of its “priority right” at least one originating application in common with at least one other member of the family.
Example 3: ("D" equals any country code)
|Document Number||Priority Number|
|D000034||P00003 and P00004|
|D000035||P00004 and P00005|
In this case, all related documents sharing at least one priority member shown in the list above may enter the family. Technically equivalent documents filed outside of Paris Convention limits, that is, filed more than 12 months after its priority date and therefore unable to claim that priority, are not placed into this type of family.
|D000036 (technically equivalent document)||(new application number as priority number)|
-> NOT PART OF THE ABOVE FAMILY
(4) National Patent Family
“National patent family” means a patent family relating to one or more inventions, the members of which are published by the same office and at least two of which are distinct from each other (i.e., not merely a different procedural publication stage for the same originating application – see domestic patent family), and having for their basis of “priority right” at least one originating application in common with the other members of the family. The relationship of at least two of the published patent documents in this type of patent family is a result of additions, continuations, continuations-in-part, or divisions of the original subject of invention covered by an originating application.
Example 4: ("CC" equals the same country of publication)
|Document Number||Priority Number|
|CC000043||P00004 and P00005|
(5) Domestic Patent Family
“Domestic patent family” means a patent family consisting solely of a single office's different procedural publications for the same originating application(s).
Example 5: ("CC" equals the same country of publication)
|Document Number||Kind Code|
(6) Artificial Patent Family
Artificial Patent Family (intellectual or non-conventional patent family) – means a patent family consisting of a collection of equivalent patent documents (i.e., documents relating to the same invention) published by different offices and at least some of which do not share a common originating application or applications (or where data relating to such a common originating application is not disclosed). The members of this type of family are determined only after intellectual investigation to have essentially the same disclosed content.
Example 6: ("D" equals any country code)
|Document Number||Priority Number|
|D000064||P00006 and P00007|
(May or may not be admissible depending on specific database producer rules)
|D000065 (technically equivalent document)||P00008|
(May be added to the family as a non-convention equivalent after an intellectual comparison)
Major Patent Family Data Sources
Before beginning any discussion of patent family data producers, it is important to note that there is no completely comprehensive source of patent family information. Of the providers discussed below, the INPADOC collection covers the greatest number of patenting authorities, at over 80 countries. By contrast, there are over 170 countries which actually grant patent protection. However, the sources below are the most well known producers of patent family data, and they both cover all the “major” patenting authorities in the market today.
The INPADOC (INternational PAtent DOCumentation) Centre was founded in 1974 by an agreement between the Austrian Patent Office and the World Patent Organization. The purpose of the company was to provide a number of patent information services, but most importantly to collect and document patent families. The company was incorporated into the European Patent Office (EPO) over 1989-1991, which continued its data production services and continued licensing the data to commercial providers. 
In the 2000s, the EPO finished a project to combine its own master bibliographic file (DOCDB; also known as DOC-DB) with the INPADOC records, extending the length of the database's coverage. This effort added about 14 million more records to the file, and substantially lengthened the coverage of some authorities back to the 1800s. It also increased the number of patenting authorities in the database, from “over 70” to a total of 80-plus countries covered, and it increased the number of countries covered by original-language abstracts to over 40. In 2009 these numbers were extended once again when the EPO began adding data feeds from 7 new South American countries.  
INPADOC patent families were first produced at a time when patent family information was scarce and hard to obtain. Perhaps because of this, the service based its patent family definitions on the broadest basis possible, requiring only that a new family member had to share at least one priority document with at least one other patent in the family – an “extended” patent family (see number 3 in the list above). Typically, referring to a document's “INPADOC” patent family today means its extended patent family. According to the EPO, when INPADOC's legal status data (the PRS file) is loaded alongside its bibliographic data (the PFS file) on commercial providers, this legal status information is also used to construct the "extended" families, further recovering "divisional applications, continuations, continuations in part or national publications of first filings of PCT (international) applications", for which priority links may have been missing.
In addition to “extended” patent families based on priority data, the EPO search team has also worked on older documents without adequate priority data, associating them using inventor, assignee and subject matter information to create “artificial” or “technical” patent families (type 6, above). These families are created by assigning an artificial common priority number to the collections. This work was done extensively in anticipation of the introduction of IPC-R reform classifications, which were then applied to all members of a simple patent family. Thus, this effort reduced the burden of re-classifying many older records.
The iterative and all-inclusive nature of the INPADOC extended family is described in this excerpt from the EPO's user information pages:
- In the "extended" (INPADOC) patent family it does not matter where you start the search. It can be an application number, a priority application number or a publication number.
- If the search starts with a publication number, all application numbers, domestic application numbers, priority numbers and international application numbers are used to retrieve additional documents. For all documents found in this step, step one is repeated. This iteration process ends only when no more new documents can be found.
- Raw data resources (INPADOC) also use some additional sophisticated rules for certain countries, for example if publication numbers are used instead of priority numbers in the original documents. This happened rather frequently for older documents, when the priority claims were not treated as carefully as they are now.
- The inclusion of legal status information in the patent search also sometimes retrieves additional links, e.g. for divisional applications, continuations, continuations in part or national publications of first filings of PCT (international) applications, where the priority links are often missing.
Because of their inclusive family-building rules, INPADOC families can become quite large. According to expert Davide Lingua, the chemical and biological subject fields tend to generate the largest extended INPADOC families, due to the large number of priority documents commonly supporting these inventions; the field of genetic engineering is known to have INPADOC patent families reaching into the thousands. 
The timeliness of the INPADOC/DOCDB database is totally dependent upon the national patent offices, which may not provide updates on a regular schedule. However, publications from major patenting authorities should appear in the database after a maximum of 4-6 weeks, with data from many countries available within days of publication (this timeframe from Adams, 2006). Another difficulty of relying on the national patent offices for raw data is that the data quality (and error rate) is largely outside of the EPO's control. Gaps in the underlying data can mean that the INPADOC families are actually missing certain family members.
Problems with INPADOC patent families can also be introduced by errors in the underlying source data from national patent offices. For example, if a typographical error was made in the priority data for a given document, that document can be incorrectly associated with the wrong patent family, even though it may contain completely unrelated subject matter. Due to the automated nature of INPADOC family construction, such errors are unlikely to be quickly found and corrected.
Because this patent family data product is available to commercial patent information providers, it is the basis for many patent family files hosted by various database producers. Although they may claim to rely on multiple sources, and apply various patent family algorithms to the raw data, DOCDB XML is the backbone of almost every commercially available patent family database (aside from the Derwent World Patents Index and CAplus, both discussed below).
Beginning in 2008, the DOCDB file will begin to include a “family identifier”, or a number which binds records into simple (type 1) patent families (families will include each application within the simple family, and all publication stages related to those applications). There will also be an indication of which family member is the “EPO-allocated representative” for that family. The stated purpose of this family identifier is to aid recipients of the data in their family building process. In addition, the EPO has stated that any further rules governing the creation of these families and representatives are internal to the EPO, and not available for public review.
Derwent World Patents Index
The Derwent World Patents Index has its roots as a current awareness bulletin for the pharmaceutical industry. Early on in its development, the producer realized that adding value to patent data using a human indexing team would help make searching for relevant patent data much easier. Thus, human indexers currently add a number of special features like re-written titles and abstracts, chemical and polymer indexing codes, and Derwent proprietary classification codes.
Because so much work goes into creating a Derwent record, the database reduces indexing costs by reviewing only one patent per simple family. To achieve this goal while still ensuring search accuracy, human indexers help organize patent records into intellectually-created families, so that each family record represents a single subject invention.
Like all patent families, the process of creating a Derwent patent family relies primarily on priority data. When a new patent document is published that appears to have unique priority data not already contained in the database, it is established as a “basic” record and it is given value-added content. From that point on, only documents which have exactly the same priority data as the basic are added to the family as “equivalents,” or more specifically, “convention equivalents.” Thus far, the Derwent patent family is a simple patent family (type 1, above). However, Derwent indexers also examine new publications intellectually, to identify additional family members known as “non-convention” equivalents. These family members are publications which must belong to the same applicant and disclose the same subject matter as the basic, but which were filed more than 12 months after the original priority filing and therefore cannot claim the original priority date (in accordance with Paris Convention rules). The action of systematically identifying non-convention equivalents makes the Derwent World Patents Index file an “artificial” family file (see type 6, above).
Data coverage in the DWPI file is different for documents in various technology areas, although it extends back as early as 1963 for pharmaceutical patents. See the Derwent World Patents Index article for a discussion of the database's coverage by technology.
The DWPI database can never be as current as non-value-added databases, such as INPADOC/DOCDB: each new basic from a major authority must be fully indexed before it is entered into the database, which takes time. However, the company that produces the file, Thomson Reuters, also offers a pre-indexed file of new records, called DWPI First View. This database reflects new publications approximately 1 week after they are first published, and searching it allows users to find records before they are added to main DWPI file. Most of the documents added to First View are “basics” disclosing new material, although some “equivalents” to established DWPI patent families may also be present. However, the First View file is not loaded on all platforms which offer DWPI.
CAplus is a collection of records produced by the Chemical Abstracts Service (CAS), corresponding to patent inventions in the chemical arts. Like the Derwent World Patents Index, these records actually consist of re-written abstracts and special indexing content created by a staff of experts. The collection receives patent bibliographic data from a number of different sources around the world, covering – but for nine major patent offices (EP, PCT, US, CA, DE, FR, GB, JP, RU), CAS receives new patent data within 48 hours of publication. If these records have unique application and priority data, they become the “basic” record for that invention, and are abstracted accordingly. Any publications that arrive subsequent to the “basic” are added to the patent family as “equivalents”: the document information will be added to the database, but no additional indexing will take place. CAS receives bibliographic information from many other patent offices around the globe, and publication from those sources may also become “basics” if their priority information does not match any other database in the record.
Because the collection is geared principally toward covering new inventions the chemical arts, the patent family construction effort for extended family situations with multiple priority numbers is decided on a case-by-case basis. In other words, if a new record is discovered to have a complex priority relationship to the basic, the decision to create a separate record is made only if the document appears to describe significantly new inventive material. In other words, some extended patent families may be represented by a single CAS abstract, while others may be represented by multiple database records. Some special rules apply: for example, patent families which are extended by US continuations-in-part will always be given a new abstract to represent the new inventive material added by the continuation. Whether or not a separate abstract is created for the extended family members, these members are always displayed as part of a complete CAplus record.
Family data for CAplus records extends back to the earliest records in the database, to 1957. As part of the process of incorporating these older records into CAplus, an attempt was made to reconstruct family data for them. However, because these records were produced long before the digital age, users should keep in mind that this data may be imperfect.
This patent family data file is interesting from a historical perspective, and because its family structure formed the basis for Questel's currently available PlusPat file, discussed further below.
EDOC was the commercial version of the EPO's internal patent examiner database, which was known as the European Patent Organisation's Search Documentation System or EPODOC in its internal implementation. This commercial version of the file was produced by INPI, the French Patent Office, and loaded on the Questel system. EDOC consisted of application, priority, and publication numbers combined with ECLA classification codes. EPODOC took EDOC and added titles, abstract, inventors, and applicants. Later, the fate of the internal and commercial versions diverged, as EPODOC was merged into the INPADOC data file when the EPO took over INPADOC's operations in the early 1990's. The commercial version, EDOC, was relaunched by Questel in the late 1990's as the PlusPat file. PlusPat now draws on the INPADOC file as a source, but uniquely among its INPADOC-based family competitors, it has retained the original EDOC/EPODOC family structure.
The unique features of the EPODOC file, as listed by Adams, included:
- The inclusion of some document types not covered in INPADOC
- Enhanced coverage beyond the typical INPADOC range (British patents from 1909; German from 1877)
- Availability of ECLA classes for a substantial portion of the file
Patent Family Data Sources using INPADOC/DOCDB as Source Data
Of the patent family files listed below, many of the patent family algorithms are at least partially proprietary to the database producers. However, it is certain that all of the sources listed below rely heavily on the DOCDB XML bibliographic file produced by the EPO as source data. By using the priority data for each document, proprietary rules can be applied to create (and differentiate) family databases.
Produced by Questel, the FamPat database is a simple family database (type 1, above). FamPat's release provided a way for Questel users to resolve some of the shortcomings of an earlier Questel patent family database, called PlusPat (discussed below).
The international family groupings in this file are constructed by a strict methodology intended to create small groupings of closely related documents; the algorithm used to accomplish this is unique to this file. According to the Questel website, a single family record in FamPat combines together all publication stages of the family, and the FamPat family definition incorporates the EPO’s strict family rule with additional rules to include:
- Applications falling outside the 12 month filing limit;
- Links between EP and PCT publications;
- Combining US Provisionals that share the same priority with US Published Applications.
- FamPat’s family definitions also incorporate different patenting authorities’ definitions of an invention, particularly useful with Japanese publication searching.
FamPat family members must all share exactly the same priority information, but in a few cases, exceptions are made to accommodate the differences between various national patent laws. An example of this extension was discussed by Nancy Lambert in her online article covering the release of the FamPat database for Information Today:
Let's say three Japanese patent publications that are closely related show up as three publications with three different priorities. They will be listed as three separate records. However, if a U.S. patent then shows up citing all three of the Japanese priorities (and, presumably, covering all three of the Japanese publications' technology), FamPat combines all four of them into one FamPat family. In other words, their families are dynamic rather than static.
FamPat will display a representative family member for each family in the database – this is usually done in a way that ensures the English-language document is chosen. However, via the system settings on Orbit.com, users can set their own preferences for what type of document will be shown as the representative.
The FamPat record is also enhanced with Questel Key Content, which includes "high value-added fields extracted from the core of the patents' full text," including:
- Object of the invention
- Advantages of the new invention and Drawback of prior art
- Independent claims
- EP published applications since 1980 (Euro-PCTs excluded)
- PCT published applications since Mid 2001
- US granted patents from 1971 to 2000
- US published applications since March, 15, 2001
Key Concepts are another value-added feature included on FamPat records, and these main concepts "are extracted with language technologies from the Full Text of the patent publications for US, EP and WO documents published in English. The coverage is the same as from Key content, shown above. The larger the characters and the brighter the blue coloring, the more important the concept."
In addition to the Key Content and Key Concepts, FamPat Records include the following content:
- First Page Information: Patent and published application numbers and publication dates, application numbers and filing dates, priority numbers and priority dates, the assignee (s), inventor (s), titles, abstracts and drawings.
- Classification Codes: EPO (ECLA, ICO, IDT and Berlin), International Patent Classification (IPC), the US Patent Office (PCL) and Japanese Patent Office Classifications (FI, F-Terms).
- Patent and Non Patent Literature Citations for WO, EP, US, EA, AP, AU, BE, CH, CZ, DE, DK, ES, FR, GB, GR, JP, LU, NL, SG and TR publications.
- Full Text Claims and Descriptions: Full Text claims and descriptions are searchable for WO, US, EP, AT, BE, BR, CA, CH, CN, DE, ES, FR, GB, JP, RU, DK, FI and SE, IN, TW, KR publications.
- Legal Status information (Inpadoc PRS data) for approximately 50 countries is included in FamPat. For JP documents the NRI legal status data may be displayed.
To learn more about the exact contents of FamPat by country, see the Bibliographic Coverage section of the Orbit.com article.
The precursor to FamPat was the original PlusPat database, which came from the logic used in the European patent examiner’s database, EPODOC. In organizing their database, the EPO grouped together all publication stages for a single patent, thereby preventing the unnecessary duplication caused by retrieving the published application and the granted patent as two separate hits. Thus, the PlusPat family definition is really just a convenient way to prevent coming across multiple publication stages of the same document during a search. This is also known as a "domestic" patent family, or a "type 5" family as defined by WIPO. Although the PlusPat family is not specifically labeled as such on the Orbit platform, the records within the PlusPat and Full Text collections are both organized according to this family theory, with all publication stages for a single document (from a single issuing authority) organized under one record.
To learn more about the exact contents of PlusPat by country, see the Bibliographic Coverage section of the Orbit.com article.
Family structure is particularly important in this database, as the PatBase system treats aggregated family data as a kind of meta-record. Unlike the Derwent World Patents Index, which uses that technique to streamline its database, the PatBase system loads as much full text data for each individual publication record as possible, to increase the likelihood that it will be found by a search.
The database relies on extended INPADOC (type 3) patent families, and creates them from DOCDB bibliographic data. According to a seminal work on patent information by Adams (2006), the system does apply some corrections to family data. Information provided by system representatives indicates that any corrections are performed to rectify erroneous priority information data for individual documents, and do not impact the overall family structure used by the system.
A representative document number, abstract, and image for the family is chosen by the PatBase algorithm, in a manner that attempts to ensure an English-language document is displayed.
The goals of esp@cenet are 1) to provide representative patent family members in English wherever possible and 2) to then provide equivalent documents from other countries that are the most likely to have very similar technical content. The esp@cenet system accomplishes this by organizing records into simple patent families (type 1), where every record must share exactly the same priority (or priorities) with the rest of the family members. A representative family member is chosen to represent the entire family, and the system attempts to choose an English-language member whenever possible. It is likely that these families and representatives will exactly line up with the “family identifier” tag in the source EPO/DOCDB data, on which esp@cenet is built.
Factors Affecting Patent Family Content
In 2009, prominent patent information researcher Edlyn S. Simmons published a review of the factors which can affect patent families and, from a practical standpoint, make working with patent family information more confusing. The main factors outlined in her article are:
- Continuing applications
- Postponing patent expiration
- Typographical errors and other mistakes
- Differences in standardization schemes
- "Phantom" family members
- Variation among database family-building algorithms and recall processes
To understand how these factors may affect patent family data sources, see Simmons, Edlyn S. "'Black sheep' in the patent family." World Patent Information, Vol 31, Number 1. March 2009. Pages 11-18.
- ↑ 1.0 1.1 WIPO Handbook on Industrial Property Information and Documentation, Glossary of Terms Concerning Industrial Property Information and Documentation, section 8.1.1. http://www.wipo.int/standards/en/pdf/08-01-01.pdf. Accessed on December 2, 2007.
- ↑ PIUG Knowledge Base page on Patent Families, http://www.piug.org/patfam.php. Accessed on December 13, 2007.
- ↑ Poynder, Richard. “Interview with the EPO's Wolfgang Pilch.” http://poynder.blogspot.com/2007/10/interview-with-epos-wolfgang-pilch.html#links. Accessed on December 13, 2007.
- ↑ 4.0 4.1 4.2 4.3 Lingua, Davide G. "INPADOC: 30 years of endeavours yet unmapped territories remain!" World Patent Information. Vol 27, No. 2. June 2005. Pages 105-111.
- ↑ Resources used for this article also include training materials provided by STN. http://www.stn-international.de/stndatabases/details/inpadocdb.html and http://www.stn-international.de/training_center/e_sem/INPADOCDB_e-seminar.pdf. Accessed on December 13, 2007.
- ↑ "The EPO extends its coverage of Latin American countries in the bibliographic database - update." Posted January 23, 2009. EPO Website, http://www.epo.org/patents/updates/2009/20090114.html. Accessed on February 2, 2008.
- ↑ 7.0 7.1 7.2 "The 'extended' (INPADOC) patent family." EPO website, http://www.epo.org/patents/patent-information/about/families/inpadoc.html. Accessed on February 9, 2009.
- ↑ 8.0 8.1 8.2 8.3 Adams, Stephen R. Information Sources in Patents. Munich: KG Saur. 2006.
- ↑ This obviously valid point was made on the PIUG Discussion Forum by a contributor. Gieling, Gerben. Comment to discussion forum post, “confusion with patent family.” February 9th, 2009. PIUG Discussion Forum. http://wiki.piug.org/display/PIUG/confusion+with+patent+family?focusedCommentId=6946897#comment-6946897. Accessed on February 10, 2009.
- ↑ “Exchange Format EPO-Patent Information Resource.” Page 68. http://documents.epo.org/projects/babylon/eponet.nsf/0/C9CC8D81DA6CF279C125736300486C68/$File/ST36_User_Documentation_vs2.0.pdf. Accessed on December 13, 2007.
- ↑ 11.0 11.1 From information originally provided by O'Hara, Michael P. "Patent Families in EDOC." September 28, 1998. http://www.piug.org/patfam.php. Compiled by Elyse Turner, 2000. Obtained from a re-posting of this data onto the PIUG wiki. "Patent Families." PIUG Wiki Post. http://wiki.piug.org/display/PIUG/Patent+Families. Accessed on February 12, 2009.
- ↑ "FamPat - The Invention at the Heart of Family ." Questel website, http://www.questel.com/prodsandservices/FamPat.htm. Accessed October 1, 2012.
- ↑ 13.0 13.1 Simmons, Edlyn S. "'Black sheep' in the patent family." World Patent Information, Vol 31, Number 1. March 2009. Pages 11-18.
- ↑ "Questel Key Content." Questel Website, http://www.questel.com/prodsandservices/Fampat_key_content.htm. Accessed October 1, 2012.
- ↑ 15.0 15.1 15.2 "Search Module - Document view." Questel website, http://www.questel.com/imagination/orbit_help/prd/en/PatentPatentDocument.htm. Accessed October 1, 2012.
- ↑ "General Search - Databases to search ." Questel website, http://www.questel.com/imagination/orbit_help/prd/en/PatentRegularAdvancedSearch_patentSearchInCollection.htm. Accessed October 1, 2012.
- ↑ WIPO. "Handbook on Industrial Property Information and Documentation." en / 08-01-01. Published August 2012. WIPO website, http://www.wipo.int/standards/en/pdf/08-01-01.pdf. Accessed October 2, 2012.
- ↑ Adams, Stephen R. Information Sources in Patents. Munich: KG Saur. 2006. Page 157.