Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Comparison of sequence classification techniques with BERT for named entity recognition
KTH, School of Electrical Engineering and Computer Science (EECS).
2019 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

This thesis takes its starting point from the recent advances in Natural Language Processing being developed upon the Transformer model. One of the significant developments recently was the release of a deep bidirectional encoder called BERT that broke several state of the art results at its release. BERT utilises Transfer Learning to improve modelling language dependencies in texts. BERT is used for several different Natural Language Processing tasks, this thesis looks at Named Entity Recognition, sometimes referred to as sequence classification. This thesis compares the model architecture as it was presented in its original paper with a different classifier in the form of a Conditional Random Field. BERT was evaluated on the CoNLL-03 dataset, based on English news articles published by Reuters.

The Conditional Random Field classifier overall outperforms the original Feed Forward classifier on the F1-score metric with a small margin of approximately 0.25 percentage points. While the thesis fails to reproduce the original report’s results it compares the two model architectures across the hyperparameters proposed for fine-tuning. Conditional Random Fields proves to perform better scores at most hyperparameter combination and are less sensitive to which parameters were chosen, creating an incentive for its use by reducing the effect of parameter search compared to a Feed Forward layer as the classifier. Comparing the two models shows a lower variance in the results for Conditional Random Fields.

Abstract [sv]

Den här uppsatsen tar avstamp från den senaste utvecklingen inom datorlingvistik som skett med bakgrund av den nya transformator-arkitekturen (engelska “Transformer”). En av de senare modellerna som presenterats är en djup dubbelriktad modell, kallad BERT, som förbättrade flera resultat inom datorlingvistik. BERT är en modell som tränats på generell språkförståelse genom att bearbeta stora textmängder och sedan specialanpassas till ett specifikt problemområde. BERT kan användas för flera uppgifter inom datorlingvistik men denna uppsats tittade specifikt på informationsextraktion av entiteter (engelska “Named Entity Recognition”). Uppsatsen jämförde den ursprungliga modellen som presenterades med en ny klassificerare baserat på Conditional Random Fields. Modellen utvärderades på CoNLL-03, ett dataset från Reuters nyhetsartiklar skrivna på engelska.

Resultatet visade att Conditional Random Field klassificerare presterade bättre mätt i F1-resultat, med ungefär 0.25 procentenheter. Uppsatsen lyckades inte reproducera BERTs ursprungliga resultat men jämför de två arkitekturerna över de hyperparametrar som föreslagits för specialanpassning till uppgiften. Conditional Random Fields visade bättre resultat för de flesta modellkonfigurationerna, men även mindre varians i resultat för olika parametrar vilket skapar ett starkt incitament att använda Conditional Random Fields som klassificerare.

Place, publisher, year, edition, pages
2019. , p. 47
Series
TRITA-EECS-EX ; 2019:499
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-261419OAI: oai:DiVA.org:kth-261419DiVA, id: diva2:1358371
External cooperation
Doctrin AB
Examiners
Available from: 2019-10-18 Created: 2019-10-07 Last updated: 2022-06-26Bibliographically approved

Open Access in DiVA

fulltext(1108 kB)3355 downloads
File information
File name FULLTEXT01.pdfFile size 1108 kBChecksum SHA-512
df58d715bcccfd360420eb970ae8d74eacc9be3e65fe8d2eb561ae2ae7b61031a6c102925207e0690afc6f5b50fa5e088ac56d206b10e95eeba5620aaf479b06
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 3355 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 1059 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf