Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Bayesian Models for Multilingual Word Alignment
Stockholm University, Faculty of Humanities, Department of Linguistics.ORCID iD: 0000-0002-6027-4156
2015 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

In this thesis I explore Bayesian models for word alignment, how they can be improved through joint annotation transfer, and how they can be extended to parallel texts in more than two languages. In addition to these general methodological developments, I apply the algorithms to problems from sign language research and linguistic typology.

In the first part of the thesis, I show how Bayesian alignment models estimated with Gibbs sampling are more accurate than previous methods for a range of different languages, particularly for languages with few digital resources available—which is unfortunately the state of the vast majority of languages today. Furthermore, I explore how different variations to the models and learning algorithms affect alignment accuracy.

Then, I show how part-of-speech annotation transfer can be performed jointly with word alignment to improve word alignment accuracy. I apply these models to help annotate the Swedish Sign Language Corpus (SSLC) with part-of-speech tags, and to investigate patterns of polysemy across the languages of the world.

Finally, I present a model for multilingual word alignment which learns an intermediate representation of the text. This model is then used with a massively parallel corpus containing translations of the New Testament, to explore word order features in 1001 languages.

Place, publisher, year, edition, pages
Stockholm: Department of Linguistics, Stockholm University , 2015.
Keyword [en]
word alignment, parallel text, Bayesian models, MCMC, linguistic typology, sign language, annotation transfer, transfer learning
National Category
Language Technology (Computational Linguistics)
Research subject
Linguistics
Identifiers
URN: urn:nbn:se:su:diva-115541ISBN: 978-91-7649-151-5 (print)OAI: oai:DiVA.org:su-115541DiVA: diva2:798117
Public defence
2015-05-22, hörsal 5, hus B, Universitetsvägen 10 B, Stockholm, 13:00 (English)
Opponent
Supervisors
Available from: 2015-04-29 Created: 2015-03-26 Last updated: 2015-05-05Bibliographically approved

Open Access in DiVA

fulltext(1689 kB)453 downloads
File information
File name FULLTEXT01.pdfFile size 1689 kBChecksum SHA-512
210880e6c3e9732127f74c996739aa1052979cb7b065a5937882f79fa96e92a1bcc6e47111b0a665eb747cbe5713c014bb85cb017a642b5807519b703fa995d8
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Östling, Robert
By organisation
Department of Linguistics
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 453 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 2081 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf