Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Phonotactic Structures in Swedish: A Data-Driven Approach
Stockholm University, Faculty of Humanities, Department of Linguistics.
2017 (English)Independent thesis Advanced level (degree of Master (One Year)), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Ever since Bengt Sigurd laid out the first comprehensive description of Swedish phonotactics in 1965, it has been the main point of reference within the field. This thesis attempts a new approach, by presenting a computational and statistical model of Swedish phonotactics, which can be built by any corpus of IPA phonetic script. The model is a weighted trie, represented as a finite state automaton, where states are phonemes linked by transitions in valid phoneme sequences, which adds the benefits of being probabilistic and expressible by regular languages. It was implemented using the Nordisk Språkteknologi (NST) pronunciation lexicon and was used to test against a couple of rulesets defined in Sigurd relating to initial two consonant clusters of phonemes and phoneme classes. The results largely agree with Sigurd's rules and illustrated the benefits of the model, in that it effectively can be used to pattern match against phonotactic information using regular expression-like syntax.

Abstract [sv]

Ända sedan Bengt Sigurd lade fram den första övergripande beskrivningen av svensk fonotax 1965, så har den varit den främsta referenspunkten inom fältet. Detta examensarbete försöker sig på en ny infallsvinkel genom att presentera en beräkningsbar och statistisk modell av svensk fonotax som kan byggas med en korpus av fonetisk skrift i IPA. Modellen är en viktad trie, representerad som en ändlig automat, vilket har fördelarna av att vara probabilistisk och kunna beskrivas av reguljära språk. Den implementerades med hjälp av uttalslexikonet från Nordisk Språkteknologi (NST) och användes för att testa ett par regelgrupper av initiala två-konsonant kluster av fonem och fonemklasser definierad av Sigurd. Resultaten stämmer till större del överens med Sigurds regler och visar på fördelarna hos modellen, i att den effektivt kan användas för att matcha mönster av fonotaktisk information med hjälp av en liknande syntax för reguljära uttryck.

Place, publisher, year, edition, pages
2017. , 32 p.
Keyword [en]
Phonotactics, computational phonology, trie, finite automata, pattern matching, regular languages
Keyword [sv]
Fonotax, beräkningsbar fonologi, trie, ändlig automat, mönstermatchning, reguljära språk
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:su:diva-144259OAI: oai:DiVA.org:su-144259DiVA: diva2:1109915
Supervisors
Examiners
Available from: 2017-06-15 Created: 2017-06-14 Last updated: 2017-06-15Bibliographically approved

Open Access in DiVA

Phonotactic Structures in Swedish: A Data-Driven Approach(891 kB)12 downloads
File information
File name FULLTEXT01.pdfFile size 891 kBChecksum SHA-512
20e50c418902fe643c07ecb929c30a06a08f98f529eb260ba9214902c1dc30c3823339d58387b21769d71cdeaf5d3a5d37e21face0b18a2ab977b98e142cac13
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Hultin, Felix
By organisation
Department of Linguistics
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 12 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 45 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf