Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Study on Manual and Automatic Evaluation Procedures and Production of Automatic Post-editing Rules for Persian Machine Translation
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
2017 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Evaluation of machine translation is an important step towards improving MT. One way to evaluate the output of MT is to focus on different types of errors occurring in the translation hypotheses, and to think of possible solutions to fix those errors. An error categorization is a rather beneficent tool that makes it easy to analyze the translation errors and can also be utilized to manually generate post-editing rules to be applied automatically to the product of machine translation. In this work, we define a categorization for the errors occurring in Swedish--Persian machine translation by analyzing the errors that occur in three data-sets from two websites: 1177.se, and Linköping municipality. We define three types of monolingual reference free evaluation (MRF), and use two automatic metrics BLEU and TER, to conduct a bilingual evaluation for Swedish-Persian translation. Later on, based on the experience of working with the errors that occur in the corpora, we manually generate automatic post-editing (APE) rules and apply them to the product of machine translation. Three different sets of results are obtained: (1) The results of analyzing MT errors show that the three most common types of errors that occur in the translation hypotheses are mistranslated words, wrong word order, and extra prepositions. These types of errors are placed in semantic and syntactic categories respectively. (2) The results of comparing the correlation between the automatic and manual evaluation show a low correlation between the two evaluations. (3) Lastly, applying the APE rules to the product of machine translation gives an increase in BLEU score on the largest data-set while remaining almost unchanged on the other two data-sets. The results for TER show a better score on one data-set, while the scores on the two other data-sets remain unchanged.

Place, publisher, year, edition, pages
2017. , p. 45
Keywords [en]
machine translation, Persian, automatic post-editing
National Category
General Language Studies and Linguistics
Identifiers
URN: urn:nbn:se:uu:diva-325818OAI: oai:DiVA.org:uu-325818DiVA, id: diva2:1116861
External cooperation
Convertus AB
Subject / course
Language Technology
Educational program
Master Programme in Language Technology
Supervisors
Examiners
Available from: 2017-06-28 Created: 2017-06-28 Last updated: 2018-01-13Bibliographically approved

Open Access in DiVA

fulltext(605 kB)136 downloads
File information
File name FULLTEXT01.pdfFile size 605 kBChecksum SHA-512
bb1057a9079a8777ae16b35d9ba08bf5b0ef02eaaac786c414e4333af92e0fbcba543050409fc24097107775ea07dc24a51c6011587c0daf862244881d70ae1f
Type fulltextMimetype application/pdf

By organisation
Department of Linguistics and Philology
General Language Studies and Linguistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 136 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 1251 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf