Digitala Vetenskapliga Arkivet

Endre søk
Begrens søket
1 - 4 of 4
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Adesam, Yvonne
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    The Multilingual Forest: Investigating High-quality Parallel Corpus Development2012Doktoravhandling, monografi (Annet vitenskapelig)
    Abstract [en]

    This thesis explores the development of parallel treebanks, collections of language data consisting of texts and their translations, with syntactic annotation and alignment, linking words, phrases, and sentences to show translation equivalence. We describe the semi-manual annotation of the SMULTRON parallel treebank, consisting of 1,000 sentences in English, German and Swedish. This description is the starting point for answering the first of two questions in this thesis.

    • What issues need to be considered to achieve a high-quality, consistent,parallel treebank?

    The units of annotation and the choice of annotation schemes are crucial for quality, and some automated processing is necessary to increase the size. Automatic quality checks and evaluation are essential, but manual quality control is still needed to achieve high quality.

    Additionally, we explore improving the automatically created annotation for one language, using information available from the annotation of the other languages. This leads us to the second of the two questions in this thesis.

    • Can we improve automatic annotation by projecting information available in the other languages?

    Experiments with automatic alignment, which is projected from two language pairs, L1–L2 and L1–L3, onto the third pair, L2–L3, show an improvement in precision, in particular if the projected alignment is intersected with the system alignment. We also construct a test collection for experiments on annotation projection to resolve prepositional phrase attachment ambiguities. While majority vote projection improves the annotation, compared to the basic automatic annotation, using linguistic clues to correct the annotation before majority vote projection is even better, although more laborious. However, some structural errors cannot be corrected by projection at all, as different languages have different wording, and thus different structures.

    Fulltekst (pdf)
    fulltext
  • 2.
    Samuelsson, Yvonne
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Dickinson, Markus
    Department of Linguistics, Indiana University .
    Consistency Checking for Treebank Alignment2010Inngår i: Proceedings of the Fourth Linguistic Annotation Workshop / [ed] Nianwen Xue and Massimo Poesio, Association for Computational Linguistics , 2010, s. 38-46Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper explores ways to detect errors in aligned corpora, using very little technology. In the first method, applicableto any aligned corpus, we consider alignment as a string-to-string mapping. Treating the target string as a label, we examine each source string to find inconsistencies in alignment. Despite setting up the problem on a par with grammatical annotation, we demonstrate crucial differences in sorting errors from legitimate variations. The second method examines phrase nodes which are predicted to be aligned, based on the alignment of their yields. Both methods are effective in complementary ways.

  • 3.
    Samuelsson, Yvonne
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Volk, Martin
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Alignment Tools for Parallel Treebanks2007Inngår i: Data Structures for Linguistic Resources and Applications: Proceedings of the Biennial GLDV Conference 2007, 2007Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper reports about our efforts in creating a tri-lingual parallel treebank. The focal points are consistency checking and all aspects of sub-sentential alignment. We discuss the alignment guidelines, the importance of quality checks, and special alignment problems. Then we look at alignment algorithms and alignment visualization tools and we compare our own TreeAligner with other alignment tools. Our constituent structure treebanks contain just over 1,000 sentences and around 18,000 tokens in each language.

  • 4.
    Samuelsson, Yvonne
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Volk, Martin
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Automatic Phrase Alignment: Using Statistical N-Gram Alignment for Syntactic Phrase Alignment2007Inngår i: Proceedings of the Sixth International Workshop on Treebanks and Linguistic Theories (TLT 2007) / [ed] Koenraad De Smedt, Jan Hajič and Sandra Kübler, Northern European Association for Language Technology (NEALT) , 2007, s. 139-150Konferansepaper (Fagfellevurdert)
    Abstract [en]

    A parallel treebank consists of syntactically annotated sentences in two or more languages, taken from translated documents. These parallel sentences are linked through alignment. This paper explores the use of word n-gram alignment, computed for statistical machine translation, to create syntactic phrase alignment. We achieve a weighted F0.5-score of over 65%.

1 - 4 of 4
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf