Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Improving alignment for SMT by reordering and augmenting the training corpus
Linköping University, Department of Computer and Information Science, NLPLAB - Natural Language Processing Laboratory. Linköping University, The Institute of Technology.
Linköping University, Department of Computer and Information Science, NLPLAB - Natural Language Processing Laboratory. Linköping University, The Institute of Technology.
Linköping University, Department of Computer and Information Science, NLPLAB - Natural Language Processing Laboratory. Linköping University, The Institute of Technology.
Linköping University, Department of Computer and Information Science, NLPLAB - Natural Language Processing Laboratory. Linköping University, The Institute of Technology.
2009 (English)In: Proceedings of the Fourth Workshop on Statistical Machine Translation (WMT09), Athens, Greece, 2009, p. 120-124Conference paper, Published paper (Refereed)
Abstract [en]

We describe the LIU systems for English-German and German-English translation in the WMT09 shared task. We focus on two methods to improve the word alignment: (i) by applying Giza++ in a second phase to a reordered training corpus, where reordering is based on the alignments from the first phase, and (ii) by adding lexical data obtained as high-precision alignments from a different word aligner. These methods were studied in the context of a system that uses compound processing, a morphological sequence model for German, and a part-of-speech sequence model for English. Both methods gave some improvements to translation quality as measured by Bleu and Meteor scores, though not consistently. All systems used both out-of-domain and in-domain data as the mixed corpus had better scores in the baseline configuration.

Place, publisher, year, edition, pages
Athens, Greece, 2009. p. 120-124
Keywords [en]
Machine translation, reordering, word alignment
National Category
Language Technology (Computational Linguistics) Computer Sciences
Identifiers
URN: urn:nbn:se:liu:diva-58978OAI: oai:DiVA.org:liu-58978DiVA, id: diva2:347917
Conference
The Fourth Workshop on Statistical Machine Translation (WMT09)
Available from: 2010-09-03 Created: 2010-09-03 Last updated: 2018-01-12

Open Access in DiVA

No full text in DiVA

Search in DiVA

By author/editor
Holmqvist, MariaStymne, SaraJody, FooAhrenberg, Lars
By organisation
NLPLAB - Natural Language Processing LaboratoryThe Institute of Technology
Language Technology (Computational Linguistics)Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 117 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf