Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Comparative Study of Supervised Methods for Keyword Extraction in Patent Applications
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematics (Div.).
2024 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Jämförande analys av övervakade metoder för nyckelordsutvinning i patentansökningar (Swedish)
Abstract [en]

In their patent review process, the Swedish Intellectual Property Office (PRV) uses keywords to search for material related to inventions in databases. Currently, the process of finding keywords is manual and the extracted keywords depend on the judgment of the assigned reviewer. This thesis aims to compare different supervised classification models on the labeled patent data, to evaluate the performance, strengths, and weaknesses of the models. The explored models are Support Vector Machines (SVMs), Naive Bayes (NB) classifiers, and Multilayer Perceptrons (MLPs). The models are trained on labeled patent data with lexical and statistical features and evaluated using metrics such as Receiver Operating Characteristics (ROC), Precision-Recall (PR), and calibration plots.

The analysis of the results showed that neural network and Naive Bayes, with similar performance, outperformed SVM. MLP is considered a better fit for the task as it achieved a higher recall score. Naive bayes consistently achieves promising results across the feature iterations, meaning we can achieve nearl MLP level results with a simpler model.

Abstract [sv]

I deras granskningsprocess av patent, använder sig Patent- och registreringsverket (PRV) av nyckelord för att söka efter material relaterat till uppfinningar i databaser. För närvarande sker processen att hitta nyckelord manuellt, och de extraherade nyckelorden är subjektiva och beror på den tilldelade granskarens bedömning. Denna uppsats syftar till att jämföra olika klassificeringsmodeller baserat på märkt patentdata, för att avgöra modellernas relativa prestanda, styrkor och svagheter. De modeller som utforskas är Support Vector Machines (SVMs), Naive Bayes (NB) klassificerare och Multilayer Perceptrons (MLP). Modellerna tränas på märkt patentdata med lexikala och statistiska features och utvärderas med hjälp av mått som Receiver Operating Characteristic (ROC), Precision-Recall (PR) och kalibreringskurvor.

Analysen av resultaten visade att de neurala nätverken och Naive Bayes-metoderna, med liknande prestanda, överträffade SVM. MLP anses vara den bättre passande modellen eftersom den uppnår högre recall. Naive Bayes uppnår konstant bra resultat över de olika feature iterationerna, vilket innebär att resultaten för MLP-modellen nästan kan uppnås med en enklare modell.

Place, publisher, year, edition, pages
2024. , p. 67
Series
TRITA-SCI-GRU ; 2024:456
Keywords [en]
Keyword extraction, Patent, Support Vector Machine, Neural Network, Naive Bayes
Keywords [sv]
Nyckelordsextraktion, Patent, Support Vector Machine, Neurala Nätverk, Naive Bayes
National Category
Other Mathematics
Identifiers
URN: urn:nbn:se:kth:diva-362855OAI: oai:DiVA.org:kth-362855DiVA, id: diva2:1954976
External cooperation
Patent - och registreringsverket
Subject / course
Mathematical Statistics
Educational program
Master of Science - Applied and Computational Mathematics
Supervisors
Examiners
Available from: 2025-04-28 Created: 2025-04-28 Last updated: 2025-04-28Bibliographically approved

Open Access in DiVA

fulltext(1539 kB)21 downloads
File information
File name FULLTEXT01.pdfFile size 1539 kBChecksum SHA-512
39972023ce600e58f3fca7e994f6636e1aa6c3c47084af11b46287eaee8f22b621a0221259d82e67ffd6e94c8e881d46c1b66fafbd1a6a854950c66143ff5a6e
Type fulltextMimetype application/pdf

By organisation
Mathematics (Div.)
Other Mathematics

Search outside of DiVA

GoogleGoogle Scholar
Total: 21 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 221 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf