Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Ambiguous synonyms: Implementing an unsupervised WSD system for division of synonym clusters containing multiple senses
Linköping University, Department of Computer and Information Science.
2019 (English)Independent thesis Basic level (degree of Bachelor), 12 credits / 18 HE creditsStudent thesis
Abstract [en]

When clustering together synonyms, complications arise in cases of the words having multiple senses as each sense’s synonyms are erroneously clustered together. The task of automatically distinguishing word senses in cases of ambiguity, known as word sense disambiguation (WSD), has been an extensively researched problem over the years. This thesis studies the possibility of applying an unsupervised machine learning based WSD-system for analysing existing synonym clusters (N = 149) and dividing them correctly when two or more senses are present. Based on sense embeddings induced from a large corpus, cosine similarities are calculated between sense embeddings for words in the clusters, making it possible to suggest divisions in cases where different words are closer to different senses of a proposed ambiguous word. The system output is then evaluated by four participants, all experts in the area. The results show that the system does not manage to correctly divide the clusters in more than 31% of the cases according to the participants. Moreover, it is discovered that some differences exist between the participants’ ratings, although none of the participants predominantly agree with the system’s division of the clusters. Evidently, further research and improvements are needed and suggested for the future.

Place, publisher, year, edition, pages
2019. , p. 33
Keywords [en]
SenseGram, unsupervised word sense disambiguation, word sense induction, word2vec, homonymy, ambiguity
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:liu:diva-157622ISRN: LIU-IDA/KOGVET-G--19/003--SEOAI: oai:DiVA.org:liu-157622DiVA, id: diva2:1327071
External cooperation
Fodina Language Technology AB
Subject / course
Cognitive science
Supervisors
Examiners
Available from: 2019-06-19 Created: 2019-06-19 Last updated: 2019-06-19Bibliographically approved

Open Access in DiVA

fulltext(307 kB)36 downloads
File information
File name FULLTEXT01.pdfFile size 307 kBChecksum SHA-512
b12908a7aba6a30e4b3ab73426c2ce38fd9dcdbbaec3eb27c9c7d2c32ad1710a84404672c0342426ad43293edaf9cc97b642ef2f9f33bf6796866f1356f4e7de
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Wallin, Moa
By organisation
Department of Computer and Information Science
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 36 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 205 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf