Assessing a Swedish Automatic Speech Recognition Model for Finland Swedish
2024 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE credits
Student thesis
Abstract [en]
Automatic speech recognition (ASR) technology advancements have revolutionized communication by converting spoken language into text. Despite these advancements, ASR systems often struggle with nonstandard dialects due to their training on standard language varieties. Speakers of minority dialects face the risk of marginalization by mainstream ASR technologies, potentially widening the digital divide for speakers of nonstandard dialects.
This work investigates the challenges encountered in developing ASR models for pluricentric languages (language spoken in more than one country as a national or official language) and nonstandard dialects, focusing on Swedish, which is an official language in both Sweden and Finland. Specifically, it addresses how well a speech-to-text model trained solely on Sweden Swedish performs when transcribing Finland Swedish dialects.
We have used a quantitative and experimental approach to evaluate transcriptions generated by VoxRex, a Wav2Vec 2.0 model developed by KBLab at the National Library of Sweden (KB), for the Aalto Finland Swedish Parliament ASR Corpus 2015-2020.
The evaluation results show that the model achieves a mean word error rate (WER) of 15.96% for the original dataset and 15.11% for a cleaned dataset (after removing non-Swedish observations). The results show higher WER compared to prior evaluation results for the model, namely a WER of 2.5% for the datasets NST and Common Voice.
The results imply that the model performs inferior when transcribing Finland Swedish than Sweden Swedish. The inferior performance may be due to dialectal differences between Sweden Swedish and Finland Swedish, the presence of Swedish spoken with a Finnish accent, or discrepancies between parliamentary references and the actual speech. Our findings indicate variability in WERs for different dialectal regions, with the highest scores for predominantly unilingual Finnish regions. The model also face challenges with some common Swedish words, parliamentary terminology, abbreviations, and compound words.
Place, publisher, year, edition, pages
2024.
Keywords [en]
Natural language processing, automatic speech recognition, speech-to-text, word error rate, low-resource languages, Finland Swedish
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:su:diva-242652OAI: oai:DiVA.org:su-242652DiVA, id: diva2:1955543
2025-04-302025-04-30