Digitala Vetenskapliga Arkivet

Change search
Refine search result
1 - 4 of 4
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Adesam, Yvonne
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    The Multilingual Forest: Investigating High-quality Parallel Corpus Development2012Doctoral thesis, monograph (Other academic)
    Abstract [en]

    This thesis explores the development of parallel treebanks, collections of language data consisting of texts and their translations, with syntactic annotation and alignment, linking words, phrases, and sentences to show translation equivalence. We describe the semi-manual annotation of the SMULTRON parallel treebank, consisting of 1,000 sentences in English, German and Swedish. This description is the starting point for answering the first of two questions in this thesis.

    • What issues need to be considered to achieve a high-quality, consistent,parallel treebank?

    The units of annotation and the choice of annotation schemes are crucial for quality, and some automated processing is necessary to increase the size. Automatic quality checks and evaluation are essential, but manual quality control is still needed to achieve high quality.

    Additionally, we explore improving the automatically created annotation for one language, using information available from the annotation of the other languages. This leads us to the second of the two questions in this thesis.

    • Can we improve automatic annotation by projecting information available in the other languages?

    Experiments with automatic alignment, which is projected from two language pairs, L1–L2 and L1–L3, onto the third pair, L2–L3, show an improvement in precision, in particular if the projected alignment is intersected with the system alignment. We also construct a test collection for experiments on annotation projection to resolve prepositional phrase attachment ambiguities. While majority vote projection improves the annotation, compared to the basic automatic annotation, using linguistic clues to correct the annotation before majority vote projection is even better, although more laborious. However, some structural errors cannot be corrected by projection at all, as different languages have different wording, and thus different structures.

    Download full text (pdf)
    fulltext
  • 2.
    Samuelsson, Yvonne
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Dickinson, Markus
    Department of Linguistics, Indiana University .
    Consistency Checking for Treebank Alignment2010In: Proceedings of the Fourth Linguistic Annotation Workshop / [ed] Nianwen Xue and Massimo Poesio, Association for Computational Linguistics , 2010, p. 38-46Conference paper (Refereed)
    Abstract [en]

    This paper explores ways to detect errors in aligned corpora, using very little technology. In the first method, applicableto any aligned corpus, we consider alignment as a string-to-string mapping. Treating the target string as a label, we examine each source string to find inconsistencies in alignment. Despite setting up the problem on a par with grammatical annotation, we demonstrate crucial differences in sorting errors from legitimate variations. The second method examines phrase nodes which are predicted to be aligned, based on the alignment of their yields. Both methods are effective in complementary ways.

  • 3.
    Samuelsson, Yvonne
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Volk, Martin
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Alignment Tools for Parallel Treebanks2007In: Data Structures for Linguistic Resources and Applications: Proceedings of the Biennial GLDV Conference 2007, 2007Conference paper (Refereed)
    Abstract [en]

    This paper reports about our efforts in creating a tri-lingual parallel treebank. The focal points are consistency checking and all aspects of sub-sentential alignment. We discuss the alignment guidelines, the importance of quality checks, and special alignment problems. Then we look at alignment algorithms and alignment visualization tools and we compare our own TreeAligner with other alignment tools. Our constituent structure treebanks contain just over 1,000 sentences and around 18,000 tokens in each language.

  • 4.
    Samuelsson, Yvonne
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Volk, Martin
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Automatic Phrase Alignment: Using Statistical N-Gram Alignment for Syntactic Phrase Alignment2007In: Proceedings of the Sixth International Workshop on Treebanks and Linguistic Theories (TLT 2007) / [ed] Koenraad De Smedt, Jan Hajič and Sandra Kübler, Northern European Association for Language Technology (NEALT) , 2007, p. 139-150Conference paper (Refereed)
    Abstract [en]

    A parallel treebank consists of syntactically annotated sentences in two or more languages, taken from translated documents. These parallel sentences are linked through alignment. This paper explores the use of word n-gram alignment, computed for statistical machine translation, to create syntactic phrase alignment. We achieve a weighted F0.5-score of over 65%.

1 - 4 of 4
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf