Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Word order typology through multilingual word alignment
Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.ORCID iD: 0000-0002-6027-4156
2015 (English)In: The 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing: Proceedings of the Conference, Volume 2: Short Papers, 2015, 205-211 p.Conference paper, Published paper (Refereed)
Abstract [en]

With massively parallel corpora of hundreds or thousands of translations of the same text, it is possible to automatically perform typological studies of language structure using very large language samples. We investigate the domain of wordorder using multilingual word alignment and high-precision annotation transfer in a corpus with 1144 translations in 986 languages of the New Testament. Results are encouraging, with 86% to 96% agreementbetween our method and the manually created WALS database for a range of different word order features. Beyond reproducing the categorical data in WALS and extending it to hundreds of other languages, we also provide quantitative data for therelative frequencies of different word orders, and show the usefulness of this for language comparison. Our method has applications for basic research in linguistic typology, as well as for NLP tasks like transfer learning for dependency parsing, which has been shown to benefit from word order information.

Place, publisher, year, edition, pages
2015. 205-211 p.
Keyword [en]
linguistic typology, word order typology, parallel texts, parallel corpora, word alignment, annotation transfer
National Category
Language Technology (Computational Linguistics) General Language Studies and Linguistics
Research subject
Linguistics; Computational Linguistics
Identifiers
URN: urn:nbn:se:su:diva-119847ISBN: 978-1-941643-73-0 (print)OAI: oai:DiVA.org:su-119847DiVA: diva2:848836
Conference
The 53rd Annual Meeting of the Association for Computational Linguistics and The 7th International Joint Conference of the Asian Federation of Natural Language Processing, Beijing, China, July 26-31, 2015
Available from: 2015-08-26 Created: 2015-08-26 Last updated: 2016-11-23Bibliographically approved

Open Access in DiVA

Word order typology through multilingual word alignment(142 kB)102 downloads
File information
File name FULLTEXT01.pdfFile size 142 kBChecksum SHA-512
35698cfd58bb84fa043b646f03da895f59a162de927bb6633df6b4e0996e0e72126d06bf6a632c33032bdde93015b240514e16bb2e5d2841f35d344769df2f10
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Östling, Robert
By organisation
Computational Linguistics
Language Technology (Computational Linguistics)General Language Studies and Linguistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 102 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 165 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf