Translationese and Swedish-English Statistical Machine Translation
Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
This thesis investigates how well machine learned classifiers can identify translated text, and the effect translationese may have in Statistical Machine Translation -- all in a Swedish-to-English, and reverse, context.
Translationese is a term used to describe the dialect of a target language that is produced when a source text is translated. The systems trained for this thesis are SVM-based classifiers for identifying translationese, as well as translation and language models for Statistical Machine Translation. The classifiers successfully identified translationese in relation to non-translated text, and to some extent, also what source language the texts were translated from.
In the SMT experiments, variation of the translation model was whataffected the results the most in the BLEU evaluation. Systems configured with non-translated source text and translationese target text performed better than their reversed counter parts. The language model experiments showed that those trained on known translationese and classified translationese performed better than known non-translated text, though classified translationese did not perform as well as the known translationese.
Ultimately, the thesis shows that translationese can be identified by machine learned classifiers and may affect the results of SMT systems.
Place, publisher, year, edition, pages
2016. , 25 p.
Translationese, Statistical Machine Translation, Text Classification, Classification of Translationese
Language Technology (Computational Linguistics)
IdentifiersURN: urn:nbn:se:uu:diva-305199OAI: oai:DiVA.org:uu-305199DiVA: diva2:1034684
Bachelor Programme in Language Technology