Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Exploring Swedish & English fastText Embeddings
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0002-5582-2031
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0002-6756-0147
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0003-4029-6574
2022 (English)In: Artificial Intelligence and Cognition 2022: Proceedings of the 8th International Workshop on Artificial Intelligence and Cognition / [ed] Hadi Banaee, Amy Loutfi, Alessandro Saffiotti, Antonio Lieto, 2022, Vol. 3400, p. 201-208Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we show that embeddings from relatively smaller corpora sometimes outperform thosefrom larger corpora and we introduce a new Swedish analogy test set and make it publicly available.To achieve good performance in Natural Language Processing (NLP) downstream tasks, several factorsplay important roles: dataset size, the right hyper-parameters, and well-trained embeddings. We utilizethe fastText tool for our experiments. We evaluate both the Swedish and English embeddings that wecreated using intrinsic evaluation (including analogy & Spearman correlation) and compare them with2 common, publicly available embeddings. Our English continuous Bag-of-Words (CBoW)-negativesampling embedding shows better performance compared to the publicly available GoogleNews version.We also describe the relationship between NLP and cognitive science. We contribute the embeddings forresearch or other useful purposes by publicly releasing them.

Place, publisher, year, edition, pages
2022. Vol. 3400, p. 201-208
Series
CEUR Workshop Proceedings, ISSN 1613-0073
Keywords [en]
Embeddings, fastText, Analogy set, Swedish
National Category
Natural Language Processing
Research subject
Machine Learning
Identifiers
URN: urn:nbn:se:ltu:diva-98277Scopus ID: 2-s2.0-85160848182OAI: oai:DiVA.org:ltu-98277DiVA, id: diva2:1766566
Conference
8th International Workshop on Artificial Intelligence and Cognition, AIC 2022, June 15-17, 2022, Örebro, Sweden
Funder
Vinnova, 2019-02996
Note

Licens fulltext: CC BY License

Available from: 2023-06-13 Created: 2023-06-13 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

fulltext(219 kB)100 downloads
File information
File name FULLTEXT01.pdfFile size 219 kBChecksum SHA-512
1b4842290f6d94c89ad582de3748d93729f22d90dc8fbfd39ad3ce0080dd7c3cdc167e0875d4569e7eafd6fd0e6c2e899d1f3e1602d06b1803e00f46479d1e9a
Type fulltextMimetype application/pdf

Other links

Scopushttps://ceur-ws.org/Vol-3400/

Search in DiVA

By author/editor
Adewumi, OluwatosinLiwicki, FoteiniLiwicki, Marcus
By organisation
Embedded Internet Systems Lab
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 100 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 325 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf