Change search
ReferencesLink to record
Permanent link

Direct link
Translationese and Swedish-English Statistical Machine Translation
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
2016 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

This thesis investigates how well machine learned classifiers can identify translated text, and the effect translationese may have in Statistical Machine Translation -- all in a Swedish-to-English, and reverse, context.

Translationese is a term used to describe the dialect of a target language that is produced when a source text is translated. The systems trained for this thesis are SVM-based classifiers for identifying translationese, as well as translation and language models for Statistical Machine Translation. The classifiers successfully identified translationese in relation to non-translated text, and to some extent, also what source language the texts were translated from.

In the SMT experiments, variation of the translation model was whataffected the results the most in the BLEU evaluation. Systems configured with non-translated source text and translationese target text performed better than their reversed counter parts. The language model experiments showed that those trained on known translationese and classified translationese performed better than known non-translated text, though classified translationese did not perform as well as the known translationese.

Ultimately, the thesis shows that translationese can be identified by machine learned classifiers and may affect the results of SMT systems.

Place, publisher, year, edition, pages
2016. , 25 p.
Keyword [en]
Translationese, Statistical Machine Translation, Text Classification, Classification of Translationese
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:uu:diva-305199OAI: oai:DiVA.org:uu-305199DiVA: diva2:1034684
Educational program
Bachelor Programme in Language Technology
Supervisors
Examiners
Available from: 2016-10-25 Created: 2016-10-12 Last updated: 2016-10-25Bibliographically approved

Open Access in DiVA

joelsson2016(225 kB)28 downloads
File information
File name FULLTEXT01.pdfFile size 225 kBChecksum SHA-512
69f45482f7bc929726f4b9df851cee289d8af72e00d4d1943499cbf5ab86298abdf68f45b367044231925630a6aa2148926d582e6d7abe807c3ccd2aa0a21e43
Type fulltextMimetype application/pdf

By organisation
Department of Linguistics and Philology
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 28 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 135 hits
ReferencesLink to record
Permanent link

Direct link