Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Using Unsupervised Morphological Segmentation to Improve Dependency Parsing for Morphologically Rich Languages
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. (Language Technology)
2018 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

In this thesis, we mainly investigate the influence of using unsupervised morphological segmentation as features on the dependency parsing of morphologically rich languages such as Finnish, Estonian, Hungarian, Turkish, Uyghur, and Kazakh. Studying the morphology of these languages is of great importance for the dependency parsing of morphologically rich languages since dependency relations in a sentence of these languages mostly rely on morphemes rather than word order. In order to investigate our research questions, we have conducted a large number of parsing experiments both on MaltParser and UDPipe. We have generated the supervised morphology and the predicted POS tags from UDPipe, and obtained the unsupervised morphological segmentation from Morfessor, and have converted the unsupervised morphological segmentation into features and added them to the UD treebanks of each language. We have also investigated the different ways of converting the unsupervised segmentation into features and studied the result of each method. We have reported the Labeled Attachment Score (LAS) for all of our experimental results.

The main finding of this study is that dependency parsing of some languages can be improved simply by providing unsupervised morphology during parsing if there is no manually annotated or supervised morphology available for such languages. After adding unsupervised morphological information with predicted POS tags, we get improvement of 4.9%, 6.0%, 8.7%, 3.3%, 3.7%, and 12.0% on the test set of Turkish, Uyghur, Kazakh, Finnish, Estonian, and Hungarian respectively on MaltParser, and the parsing accuracies have been improved by 2.7%, 4.1%, 8.2%, 2.4%, 1.6%, and 2.6% on the test set of Turkish, Uyghur, Kazakh, Finnish, Estonian, and Hungarian respectively on UDPipe when comparing the results from the models which do not use any morphological information during parsing.

Place, publisher, year, edition, pages
2018. , p. 41
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:uu:diva-354459OAI: oai:DiVA.org:uu-354459DiVA, id: diva2:1221345
Subject / course
Language Technology
Educational program
Master Programme in Language Technology
Supervisors
Examiners
Available from: 2018-06-20 Created: 2018-06-19 Last updated: 2018-06-20Bibliographically approved

Open Access in DiVA

fulltext(792 kB)24 downloads
File information
File name FULLTEXT01.pdfFile size 792 kBChecksum SHA-512
88c88e37ef7aada4e86cda3bd4ad147777bb5886fdc6614bebca253edcf7bcea2e8d6ce5b9bc15d1924e60c8890923d520928dc09bed7544fd8e99d914d107e6
Type fulltextMimetype application/pdf

By organisation
Department of Linguistics and Philology
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 24 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 46 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf