Specification Parsing Module |
XML Tag Handling |
Analyze statement XML files to remove XML tags, and save data for each field |
field data extraction |
- Collect text information from the target fields to extract core words from the statement data
- Target Field: Summary of Inventions, Billing Terms
|
Natural Language Processing Module |
morphological analysis |
Perform morphological analysis and tagging of text collected in the specification |
TM analysis |
- Generate compound nouns through noun phrase analysis
- Generating a compound phrase, such as subject-sulphrase/sulphrase-objective/substantive-explicative modifier, etc., through parsing analysis
- 문Generation of related words using co-occurrence of words (single words, compound words) in the chapter
|
Keyword extraction |
Keyword Nomination |
- Selection of candidates to be used as key words from natural language processing results
- Keyword candidates: single nouns, compound nouns, compound words, related phrases
|
elimination of words of nonverbalism |
- Eliminate key word candidates that are not suitable as key words
- Use a dictionary of nonverbal words
|
Select Keywords |
- Weighted Keywords: Computation of TF-IDF Score for each patent document's core candidate group
- Select Keywords: Top 50 Weighted Keywords for Each Patent Document
|
similar patent extraction |
document clustering |
- Keyword-Document Vector Generation: Use a list of key words for each patent document
- Create a cluster of documents: Keywords-Use document vectors to create a set of documents that contain the same core language for each patent document.
|
Document Similarity Calculation |
Calculation of similarity with patents existing within the cluster for each patent document
- similarity between documents: using cosine-similarity with weights of matching keywords between two documents
- weighting of technology
- - Weighing the technology field using IPC information and WIPO technology classification of the two patent documents
- technology field weighting
IPC Matching>Technical group matching>Technical middle category matching>Technical large category matching>Technical mismatch
- - Calculate the final similarity by applying additional technical weights to the between-documents similarity values.
|
Generating similar patents |
Top of each patent document using the final similarity value with the technology classification weight 100 patents selected as similar patents |