Nudging the Envelope of Direct Transfer Methods for Multilingual Named Entity Recognition
Number of Authors: 1
2012 (English)Conference paper (Refereed)
In this paper, we study direct transfer methods for multilingual named entity recognition. Specifically, we extend the method recently proposed by Täckström et al. (2012), which is based on cross-lingual word cluster features. First, we show that by using multiple source languages, combined with self-training for target language adaptation, we can achieve significant improvements compared to using only single source direct transfer. Second, we investigate how the direct transfer system fares against a supervised target language system and conclude that between 8,000 and 16,000 word tokens need to be annotated in each target language to match the best direct transfer system. Finally, we show that we can significantly improve target language performance, even after annotating up to 64,000 tokens in the target language, by simply concatenating source and target language annotations.
Place, publisher, year, edition, pages
Computer and Information Science
IdentifiersURN: urn:nbn:se:ri:diva-15216OAI: oai:DiVA.org:ri-15216DiVA: diva2:1036532
NAACL-HLT 2012 Workshop on Inducing Linguistic Structure