Clustering Unstructured Data via Keyword Extraction
Patent analytics expert Anthony Trippe explains clustering unstructured data via keyword extraction. "Unstructured text is defined as text that has not been indexed or segmented into individual data fields. The only structure contained within the document is the structure that was implied by the author when they put words into sentences, sentences into paragraphs, and so on…As with the clustering of structured data, text concepts instead of codes can then be used to group documents that share a high degree of overlap…Where tools for clustering tagged or structured data start by parsing the fielded data into a database, the systems for clustering unstructured text begin by identifying relevant terms within a document." [1]
Clustering of unstructured data via keyword extraction is available using the Extract Nearby Phrase command. See the Natural Language Processing section for more details.
Sources
- ↑ Trippe, Anthony J. "Patinformatics: Tasks to tools." World Patent Information 25 (2003): 211-221.