Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Fördomsfulla associationer i en svenskvektorbaserad semantisk modell
Linköping University, Department of Computer and Information Science.
2019 (Swedish)Independent thesis Basic level (degree of Bachelor), 12 HE creditsStudent thesisAlternative title
Bias in a Swedish Word Embedding (English)
Abstract [sv]

Semantiska vektormodeller är en kraftfull teknik där ords mening kan representeras av vektorervilka består av siffror. Vektorerna tillåter geometriska operationer vilka fångar semantiskt viktigaförhållanden mellan orden de representerar. I denna studie implementeras och appliceras WEAT-metoden för att undersöka om statistiska förhållanden mellan ord som kan uppfattas somfördomsfulla existerar i en svensk semantisk vektormodell av en svensk nyhetstidning. Resultatetpekar på att ordförhållanden i vektormodellen har förmågan att återspegla flera av de sedantidigare IAT-dokumenterade fördomar som undersöktes. I studien implementeras och applicerasockså WEFAT-metoden för att undersöka vektormodellens förmåga att representera två faktiskastatistiska samband i verkligheten, vilket görs framgångsrikt i båda undersökningarna. Resultatenav studien som helhet ger stöd till metoderna som används och belyser samtidigt problematik medatt använda semantiska vektormodeller i språkteknologiska applikationer.

Abstract [en]

Word embeddings are a powerful technique where word meaning can be represented by vectors containing actual numbers. The vectors allow  geometric operations that capture semantically important relationships between the words. In this study WEAT is applied in order to examine whether statistical properties of words pertaining to bias can be found in a swedish word embedding trained on a corpus from a swedish newspaper. The results shows that the word embedding can represent several of the IAT documented biases that where tested. A second method, WEFAT, is applied to the word embedding in order to explore the embeddings ability to represent actual statistical properties, which is also done successfully. The results from this study lends support to the validity of both methods aswell as illuminating the issue of problematic relationships between words in word embeddings.

Place, publisher, year, edition, pages
2019. , p. 40
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:liu:diva-159027ISRN: LIU-IDA/KOGVET-G--19/017--SEOAI: oai:DiVA.org:liu-159027DiVA, id: diva2:1338076
Subject / course
Cognitive science
Supervisors
Examiners
Available from: 2019-08-13 Created: 2019-07-19 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

micjo469_FördomsfullaAssociationer2019(698 kB)172 downloads
File information
File name FULLTEXT01.pdfFile size 698 kBChecksum SHA-512
3e5702d4248af620d7e8b347e6871019cd5b06859986b4e7c99b321cd2b81060f7996fa3090d8f24fbfb78c37f468e2446807c0f2c92cdd97cd5073e3597703b
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Jonasson, Michael
By organisation
Department of Computer and Information Science
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 172 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 276 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf