Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Discourse-Related Language Contrasts in English-Croatian Human and Machine Translation
University of Zagreb.
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. (Datorlingvistik)
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
2018 (English)In: Proceedings of the Third Conference on Machine Translation: Research Papers, 2018, p. 36-48Conference paper, Published paper (Refereed)
Abstract [en]

We present an analysis of a number of coreference phenomena in English-Croatian human and machine translations. The aim is to shed light on the differences in the way these structurally different languages make use of discourse information and provide insights for discourse-aware machine translation system development. The phenomena are automatically identified in parallel data using annotation produced by parsers and word alignment tools, enabling us to pinpoint patterns of interest in both languages. We make the analysis more fine-grained by including three corpora pertaining to three different registers. In a second step, we create a test set with the challenging linguistic constructions and use it to evaluate the performance of three MT systems. We show that both SMT and NMT systems struggle with handling these discourse phenomena, even though NMT tends to perform somewhat better than SMT. By providing an overview of patterns frequently occurring in actual language use, as well as by pointing out the weaknesses of current MT systems that commonly mistranslate them, we hope to contribute to the effort of resolving the issue of discourse phenomena in MT applications.

Place, publisher, year, edition, pages
2018. p. 36-48
Keywords [en]
Amchine translation, discourse phenomena, error analysis, coreference, Croatian, MT test suites
National Category
Natural Language Processing
Research subject
Computational Linguistics
Identifiers
URN: urn:nbn:se:uu:diva-395400OAI: oai:DiVA.org:uu-395400DiVA, id: diva2:1362158
Conference
The Third Conference on Machine Translation (WMT 2018),Brussels,October 31 — November 1, 2018.
Funder
Swedish Research Council, 2017-930Available from: 2019-10-18 Created: 2019-10-18 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

fulltext(161 kB)175 downloads
File information
File name FULLTEXT01.pdfFile size 161 kBChecksum SHA-512
353433cc15d303c1d2b1e6ae391c162abf055a809c98806ce0014bfb1752f8a10e369076565c240eb829e3a4495a165501ae6efc82d31dcfb62848599364710a
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Hardmeier, ChristianStymne, Sara
By organisation
Department of Linguistics and Philology
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 175 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 156 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf