Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Assessing a Swedish Automatic Speech Recognition Model for Finland Swedish
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
2024 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Automatic speech recognition (ASR) technology advancements have revolutionized communication by converting spoken language into text. Despite these advancements, ASR systems often struggle with nonstandard dialects due to their training on standard language varieties. Speakers of minority dialects face the risk of marginalization by mainstream ASR technologies, potentially widening the digital divide for speakers of nonstandard dialects.

This work investigates the challenges encountered in developing ASR models for pluricentric languages (language spoken in more than one country as a national or official language) and nonstandard dialects, focusing on Swedish, which is an official language in both Sweden and Finland. Specifically, it addresses how well a speech-to-text model trained solely on Sweden Swedish performs when transcribing Finland Swedish dialects.

We have used a quantitative and experimental approach to evaluate transcriptions generated by VoxRex, a Wav2Vec 2.0 model developed by KBLab at the National Library of Sweden (KB), for the Aalto Finland Swedish Parliament ASR Corpus 2015-2020.

The evaluation results show that the model achieves a mean word error rate (WER) of 15.96% for the original dataset and 15.11% for a cleaned dataset (after removing non-Swedish observations). The results show higher WER compared to prior evaluation results for the model, namely a WER of 2.5% for the datasets NST and Common Voice.

The results imply that the model performs inferior when transcribing Finland Swedish than Sweden Swedish. The inferior performance may be due to dialectal differences between Sweden Swedish and Finland Swedish, the presence of Swedish spoken with a Finnish accent, or discrepancies between parliamentary references and the actual speech. Our findings indicate variability in WERs for different dialectal regions, with the highest scores for predominantly unilingual Finnish regions. The model also face challenges with some common Swedish words, parliamentary terminology, abbreviations, and compound words.

Place, publisher, year, edition, pages
2024.
Keywords [en]
Natural language processing, automatic speech recognition, speech-to-text, word error rate, low-resource languages, Finland Swedish
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:su:diva-242652OAI: oai:DiVA.org:su-242652DiVA, id: diva2:1955543
Available from: 2025-04-30 Created: 2025-04-30

Open Access in DiVA

fulltext(1304 kB)13 downloads
File information
File name FULLTEXT01.pdfFile size 1304 kBChecksum SHA-512
026f37a21b4ecff252f26d563686347144b075f750d115a8320f9ff6bf44967c636adfd0d6f3dae7111ef0459e702403258081e2af0968cc7be46abba4d2dbf2
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Machutta Widengren, JessicaGisslén, Amanda
By organisation
Department of Computer and Systems Sciences
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 13 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 8 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf