Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Relation Extraction on Swedish Text by the Use of Semantic Fields and Deep Multi-Channel Convolutional Neural Networks
KTH, School of Electrical Engineering and Computer Science (EECS).
2019 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Relationsextrahering på svensk text via semantiska fält och djupa faltningsnätverk (Swedish)
Abstract [en]

This thesis makes two contributions to the research domain of relation extraction (RE), i.e., the automated discovery of semantic links in unstructured text. The first contribution is a method for creating a dataset for RE, and using it to create the first Swedish RE dataset involving nine relationships between persons, locations and vehicles. The second contribution is a variety of experiments on this new dataset providing baselines. The relation extraction systems created in this thesis include deep multi-channel convolutional neural networks, and Word2Vec embeddings. A manual labeling of a subset of our data shows an accuracy of 73%. We find that using a discrete representation of part-of-speech and dependency tags in the multi-channel convolutional network yields the best performance with a micro-average F1-score of 77%. The thesis discusses a variety of problems and future avenues of research, including the underlying motivation of this work: the automatic summarization of police reports in Sweden.

Abstract [sv]

Detta arbete bidrar med två insikter till forskning inom relationsextrahering (RE), det vill säga, att automatiskt upptäcka semantiska länkar i ostrukturerad text. Det första bidraget är en metod för att skapa ett dataset för RE och för att använda det till att skapa ett svenskt RE-dataset som involverar nio relationer mellan personer, platser och fordon. Det andra bidraget är en baslinje via experiment på detta nya dataset. Relationsextraheringssystemet skapat i detta arbete inkluderar ett djupt flerkanaligt faltningsnätverk med ordvektorer viaWord2Vec-algoritmen. En manuell kategorisering av en delmängd av datan visar en tillförlitlighet på 73%. Resultaten visar att användningen av en diskret representation av ordklasser och beroende-taggar i det flerkanaliga neurala nätverket presterar bäst med ett medelvärde av mikro-F1 på 77%. Detta arbete diskuterar problem och framtida tillämpningar, inkluderat den underliggande motiveringen för detta arbete: automatisk summering av svenska polisrapporter.

Place, publisher, year, edition, pages
2019. , p. 63
Series
TRITA-EECS-EX ; 2019:494
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-262494OAI: oai:DiVA.org:kth-262494DiVA, id: diva2:1361475
External cooperation
The Swedish Police Authority
Educational program
Master of Science in Engineering - Computer Science and Technology
Supervisors
Examiners
Available from: 2019-11-07 Created: 2019-10-16 Last updated: 2019-11-07Bibliographically approved

Open Access in DiVA

fulltext(1480 kB)3 downloads
File information
File name FULLTEXT01.pdfFile size 1480 kBChecksum SHA-512
63a51ef2dd3d5c08a40f2647a3b48cd35f7203e8705f92aaba19fc7850d18e2951b490bcd7f4bdb5c8eeeb3b0505525a8bce1fadbed2fac98216022c770ed982
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 3 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 13 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf