Contribution to Terminology Internationalization by Word Alignment in Parallel Corpora
2006 (English)In: AMIA 2006 Symposium Proceedings, Washington D.C., USA: AMIA , 2006, p. 185-189Conference paper, Published paper (Refereed)
Abstract [en]
Background and objectives
Creating a complete translation of a large vocabulary is a time-consuming task, which requires skilled and knowledgeable medical translators. Our goal is to examine to which extent such a task can be alleviated by a specific natural language processing technique, word alignment in parallel corpora. We experiment with translation from English to French.
Methods
Build a large corpus of parallel, English-French documents, and automatically align it at the document, sentence and word levels using state-of-the-art alignment methods and tools. Then project English terms from existing controlled vocabularies to the aligned word pairs, and examine the number and quality of the putative French translations obtained thereby. We considered three American vocabularies present in the UMLS with three different translation statuses: the MeSH, SNOMED CT, and the MedlinePlus Health Topics.
Results
We obtained several thousand new translations of our input terms, this number being closely linked to the number of terms in the input vocabularies.
Conclusion
Our study shows that alignment methods can extract a number of new term translations from large bodies of text with a moderate human reviewing effort, and thus contribute to help a human translator obtain better translation coverage of an input vocabulary. Short-term perspectives include their application to a corpus 20 times larger than that used here, together with more focused methods for term extraction.
Place, publisher, year, edition, pages
Washington D.C., USA: AMIA , 2006. p. 185-189
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:liu:diva-35773PubMedID: 17238328Local ID: 28508OAI: oai:DiVA.org:liu-35773DiVA, id: diva2:256621
Conference
AMIA 2006 Annual SymposiumWashington, DC, USANovember 11, 2006 - November 15, 2006
2009-10-102009-10-102018-01-13