Digitala Vetenskapliga Arkivet

Ändra sökning
Avgränsa sökresultatet
45678910 301 - 350 av 1846
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 301.
    Dahllöf, Mats
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik.
    An Annotational Approach to Compositional Semantics2003Ingår i: Proceedings of the Second Workshop on Treebanks and Linguistic Theories (TLT 2003), Växjö University Press , 2003, s. 33-44Konferensbidrag (Refereegranskat)
  • 302.
    Dahllöf, Mats
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik.
    An Implementation of Token Dependency Semantics for a Fragment of English2003Rapport (Övrigt vetenskapligt)
  • 303.
    Dahllöf, Mats
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Automatic prediction of gender, political affiliation, and age in Swedish politicians from the wording of their speeches: A comparative study of classifiability2012Ingår i: Literary & Linguistic Computing, ISSN 0268-1145, E-ISSN 1477-4615, Vol. 27, nr 2, s. 139-153Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The present study explores automatic classification of Swedish politicians and their speeches into classes based on personal traits-gender, age, and political affiliation-as a means for measuring and analyzing how these traits influence language use. Support Vector Machines classified 200-word passages, represented by binary bag-of-word-forms vectors. Different feature selections were tried. The performance of the classifiers was assessed using test data from authors unseen in the training data. Author-level predictions derived from twenty-one text-level predictions reached an accuracy rate of 81.2% for gender, 89.4% for political affiliation, and 78.9% for age. Classification concerning each basic distinction was applied to general populations of politicians and to cohorts defined by the other classes. The outcomes suggest that the extent to which these personal traits are expressed in language use varies considerably among the different cohorts and that different traits affect different layers of the vocabulary. The accuracy rates for gender classification were higher for the right wing and older cohorts than for the opposite ones. Age prediction gave higher accuracy for the right wing cohort. Political classification gave the highest accuracy rates when all forms were included in the feature sets, whereas feature sets restricted to verbs or function words gave the highest scores for gender prediction, and the lowest ones for political classification.

  • 304.
    Dahllöf, Mats
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Automatic Scribe Attribution for Medieval Manuscripts2018Ingår i: Digital Medievalist, ISSN 1715-0736, E-ISSN 1715-0736, Vol. 11, nr 1, s. 1-26, artikel-id 6Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We propose an automatic method for attributing manuscript pages to scribes. The system uses digital images as published by libraries. The attribution process involves extracting from each query page approximately letter-size components. This is done by means of binarization (ink-background separation), connected component labelling, and further segmentation, guided by the estimated typical stroke width. Components are extracted in the same way from the pages of known scribal origin. This allows us to assign a scribe to each query component by means of nearest-neighbour classification. Distance (dissimilarity) between components is modelled by simple features capturing the distribution of ink in the bounding box defined by the component, together with Euclidean distance. The set of component-level scribe attributions, which typically includes hundreds of components for a page, is then used to predict the page scribe by means of a voting procedure. The scribe who receives the largest number of votes from the 120 strongest component attributions is proposed as its scribe. The scribe attribution process allows the argument behind an attribution to be visualized for a human reader. The writing components of the query page are exhibited along with the matching components of the known pages. This report is thus open to inspection and analysis using the methods and intuitions of traditional palaeography. The present system was evaluated on a data set covering 46 medieval scribes, writing in Carolingian minuscule, Bastarda, and a few other scripts. The system achieved a mean top-1 accuracy of 98.3% as regards the first scribe proposed for each page, when the labelled data comprised one randomly selected page from each scribe and nine unseen pages for each scribe were to be attributed in the validation procedure. The experiment was repeated 50 times to even out random variation effects.

    Ladda ner fulltext (pdf)
    fulltext
  • 305.
    Dahllöf, Mats
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Classification of Medieval Documents: Determining the Issuer, Place of Issue, and Decade for Old Swedish Charters2020Ingår i: DHN 2020 Digital Humanities in the Nordic Countries: Proceedings of the Digital Humanities in the Nordic Countries 5th Conference / [ed] Sanita Reinsone, Inguna Skadiņa, Anda Baklāne, and Jānis Daugavietis, 2020, s. 12-23Konferensbidrag (Refereegranskat)
  • 306.
    Dahllöf, Mats
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Clustering Writing Components from Medieval Manuscripts2018Ingår i: COMHUM 2018: Book of Abstracts for the Workshop on Computational Methods in the Humanities 2018 / [ed] Piotrowski, Michael, Lausanne, 2018, s. 11-13Konferensbidrag (Refereegranskat)
  • 307.
    Dahllöf, Mats
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Clustering writing components from medieval manuscripts2019Ingår i: Proceedings of the Workshop on Computational Methods in the Humanities 2018 / [ed] Michael Piotrowski, 2019, s. 23-32Konferensbidrag (Refereegranskat)
    Abstract [en]

    This article explores a minimally supervised method for extracting components, mostly letters, from historical manuscripts, and clustering them into classes capturing linguistic equivalence. The clustering uses the DBSCAN algorithm and an additional classification step. This pipeline gives us cheap, but partial, manuscript transcription in combination with human annotation. Experiments with different parameter settings suggest that a system like this should be tuned separately for different categories, rather than rely on one-pass application of algorithms partitioning the same components into non-overlapping clusters. The method could also be used to extract features for manuscript classification, e.g. dating and scribe attribution, as well as to extract data for further palaeographic analysis.

  • 308.
    Dahllöf, Mats
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Code and Data for “Classification of Medieval Documents: Determining the Issuer, Place of Issue, and Decade for Old Swedish Charters”2020Dataset
    Abstract [en]

    Code and data for the article Classification of Medieval Documents: Determining the Issuer, Place of Issue, and Decade for Old Swedish Charters (to appear in DHN2020 Digital Humanities in the Nordic Countries}, Riga, 17--20 March 2020).

    The study based on this code and dataset is a comparative exploration of different classification tasks for Swedish medieval charters (transcriptions from the SDHK collection) and different classifier setups. In particular, we explore the identification of the issuer, place of issue, and decade of production. The experiments used features based on lowercased words and character 3- and 4-grams. We evaluated the performance of two learning algorithms: linear discriminant analysis and decision trees. For evaluation, five-fold cross-validation was performed. We report accuracy and macro-averaged F1 score. The validation made use of six labeled subsets of SDHK combining the three tasks with Old Swedish and Latin. Issuer identification for the Latin dataset (595 charters from 12 issuers) reached the highest scores, above 0.9, for the decision tree classifier using word features. The best corresponding accuracy for Old Swedish was 0.81. Place and decade identification produced lower performance scores for both languages. Which classifier design is the best one seems to depend on peculiarities of the dataset and the classification task. The present study does however support the idea that text classification is useful also for medieval documents characterized by extreme spelling variation.

    Ladda ner fulltext (zip)
    dhn2020supplement
  • 309.
    Dahllöf, Mats
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Här kommer vi!2012Ingår i: Språktidningen, ISSN 1654-5028, nr 7, s. 44-49Artikel i tidskrift (Övrig (populärvetenskap, debatt, mm))
  • 310.
    Dahllöf, Mats
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Predicting the Scribe Behind a Page of Medieval Handwriting2014Konferensbidrag (Refereegranskat)
  • 311.
    Dahllöf, Mats
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik.
    Prolog-Embedding Typed Feature Structure Grammar (PETFSG-II.2) and Grammar Tool2003Rapport (Övrigt vetenskapligt)
  • 312.
    Dahllöf, Mats
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Scribe attribution for early medieval handwriting by means of letter extraction and classification and a voting procedure for larger pieces2014Ingår i: 22nd International Conference on Pattern Recognition (ICPR), 2014, s. 1910-1915Konferensbidrag (Refereegranskat)
    Abstract [en]

    The present study investigates a method for the attribution of scribal hands, inspired by traditional palaeography in being based on comparison of letter shapes. The system was developed for and evaluated on early medieval Caroline minuscule manuscripts. The generation of a prediction for a page image involves writing identification, letter segmentation, and letter classification. The system then uses the letter proposals to predict the scribal hand behind a page. Letters and sequences of connected letters are identified by means of connected component labeling and split into letter-size pieces. The hand (and character) prediction makes use of a dataset containing instances of the letters b, d, p, and q, cut out from manuscript pages whose scribal origin is known. Letters are represented by features capturing the distribution of foreground. Cosine similarity is used for nearest neighbor classification. The hand behind a page is finally predicted by means of a voting procedure taking the highest scoring letter-level hits as its input. This hand prediction method was evaluated on pages from five different hands and reached an accuracy above 99% for four of them and 87% for a fifth significantly more difficult one. The hand behind single toplisted letters was correctly predicted in 83% of the cases.

    Ladda ner fulltext (pdf)
    fulltext
  • 313.
    Dahllöf, Mats
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik.
    Three papers on computational syntax and semantics1999Rapport (Övrigt vetenskapligt)
  • 314.
    Dahllöf, Mats
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik.
    Token Dependency Semantics and the Paratactic Analysis of Intensional Constructions2002Ingår i: Journal of Semantics, ISSN 0167-5133, E-ISSN 1477-4593, Vol. 19, nr 4, s. 333-368Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This article introduces Token Dependency Semantics (TDS), a surface‐oriented and token‐based framework for compositional truth‐conditional semantics. It is motivated by Davidson's ‘paratactic’ analysis of semantic intensionality (‘On Saying That’, 1968, Synthèse 19: 130–146), which has been much discussed in philosophy. This is the first fully‐fledged formal implementation of Davidson's proposal. Operator‐argument structure and scope are captured by means of relations among tokens. Intensional constituent tokens represent ‘propositional’ contents directly. They serve as arguments to the words introducing intensional contexts, rather than being ‘ordinary’ constituents. The treatment of de re readings involves the use of functions (‘anchors’) assigning entities to argument positions of lexical tokens. Quantifiers are thereby allowed to bind argument places on content tokens. This gives us a simple underspecification‐based account of scope ambiguity. The TDS framework is applied to indirect speech reports, mental attitude sentences, control verbs, and modal and agent‐relative sentence adverbs in English. This semantics is compatible with a traditional view of syntax. Here, it is integrated into a Head‐driven Phrase Structure Grammar (HPSG). The result is a straightforward and ontologically parsimonious analysis of truth‐conditional meaning and semantic intensionality.

  • 315.
    Dahllöf, Mats
    et al.
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Berglund, Karl
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Historisk-filosofiska fakulteten, Litteraturvetenskapliga institutionen, Avdelningen för litteratursociologi.
    Faces, Fights, and Families: Topic Modeling and Gendered Themes in Two Corpora of Swedish Prose Fiction2019Ingår i: DHN 2019 Copenhagen, Proceedings of 4th Conference of The Association Digital Humanities in the Nordic Countries Copenhagen, March 6-8 2019 / [ed] Constanza Navaretta et al., 2019, s. 92-111Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper explores topic modeling (TM) as a tool for “dis- tant reading” of two Swedish literary corpora. We investigate what kinds of insight and knowledge a TM-based approach can provide to Swedish literary history, and which methodological difficulties are associated with this endeavour. The TM is based on 12- and 24-term chunks of selected verb and common noun lemmas. We generate models with 20, 40, and 100 topics. We also propose a method for a quantitative and qualita- tive gendered thematic analysis by combining TM with a study of how the topics relate to gender in characters and authors. The two corpora contain, respectively, Swedish classics (1821–1941) and recent bestsellers (2004–2017). We find that most of the topics proposed by the TM are easy to interpret as conceptual themes, and that the “same” themes ap- pear for the two corpora and for different TM settings. The study allows us to make interesting observations concerning different aspects of gender and topic distribution.

    Ladda ner fulltext (pdf)
    fulltext
  • 316.
    Dahlqvist, Bengt
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik.
    A Swedish Text Corpus for Generating Dictionaries1999Rapport (Övrigt vetenskapligt)
  • 317.
    Dahlqvist, Bengt
    Uppsala universitet.
    Identifiering av snedfördelade ord, något om Chi2-metoden1990Rapport (Övrigt vetenskapligt)
  • 318.
    Dahlqvist, Bengt
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik.
    Kort om optisk inläsning av text (Introduction to OCR)1997Rapport (Övrigt vetenskapligt)
  • 319.
    Dahlqvist, Bengt
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Sökbarhet i digitaliserade dokument: Metoder och överväganden2010Rapport (Övrigt vetenskapligt)
  • 320.
    Dahlqvist, Bengt
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Text Processing Procedures for Analysing a Corpus with Medieval Marian Miracle Tales in Old Swedish2020Ingår i: Proceedings of the12th International Conference on Agents and Artificial Intelligence / [ed] Ana Rocha, Luc Steels, Jaap van der Herik, Setúbal, Portugal, 2020, Vol. 1, s. 452-458Konferensbidrag (Refereegranskat)
  • 321.
    Dahlqvist, Bengt
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik.
    The Distribution of Characters, Bi- and Trigrams in the Uppsala 70 Million Words Swedish Newspaper Corpus1999Rapport (Övrigt vetenskapligt)
  • 322.
    Dahlqvist, Bengt
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik.
    The SCARRIE Swedish Newspaper Corpus1999Rapport (Övrigt vetenskapligt)
  • 323.
    Dahlqvist, Bengt
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik.
    Tidningskorpus SvD, basbearbetningar: från text till ordlista (The SvD news paper corpus, basic results)1997Övrigt (Övrigt vetenskapligt)
  • 324.
    Dahlqvist, Bengt
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik.
    Tidningskorpus UNT, basbearbetningar: från text till ordlista (The UNT news paper corpus, basic results)1997Övrigt (Övrigt vetenskapligt)
  • 325.
    Dahlqvist, Bengt
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik.
    TSS, Ett program för textsegmentering och ordsortering: En rapport från projektet Textpack på PC1991Rapport (Övrigt vetenskapligt)
  • 326.
    Dahlqvist, Bengt
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik.
    TSSA 2.1, A PC Program for Text Segmentation and Sorting1995Rapport (Övrigt vetenskapligt)
  • 327.
    Dahlqvist, Bengt
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik.
    UNT 92 - textmaterial från ett tidningsarkiv, en kortfattad deskriptiv översikt (a descriptive overview of a text corpus from the UNT 92 news paper archive)1996Rapport (Övrigt vetenskapligt)
  • 328. Dahlqvist, Bengt
    et al.
    Megyesi, Beata
    Changing the tokenization in Talbanken to SUC2.02007Rapport (Övrigt vetenskapligt)
  • 329.
    Dahlqvist, Bengt
    et al.
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Nordenfors, Mikael
    Inst för svenska språket, Göteborgs universitet.
    Using the Text Processing Tool Textin to Examine Developmental Aspects of School Texts2008Ingår i: Resourceful Language Technology: Festschrift in Honor of Anna Sågvall Hein / [ed] Nivre, Joakim, Dahllöf, Mats, Megyesi, Beáta, Uppsala: Uppsala universitet, 2008, s. 61-76Kapitel i bok, del av antologi (Övrigt vetenskapligt)
    Abstract [en]

    The purpose with this article is to first make a brief presentation of the functions in the web based text processing tool Textin 1.2, and then to illuminate these functions by putting the program to use within a research project in progress that concerns developmental aspects on texts written by Swedish pupils during school years 5 to 9. The text will begin with a brief description of Textins’ main functions, and then move on to previous research on school texts where computer linguistic methods either were used or could have been used if the technology had been accessible at the time being. The article then continues with a presentation of the results that Textin delivers, and ends with a discussion on these findings.

  • 330.
    Dalianis, Hercules
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Pseudonymisation of Swedish Electronic Patient Records Using a Rule-based Approach2019Ingår i: Proceedings of the Workshop on NLP and Pseudonymisation / [ed] Lars Ahrenberg, Beáta Megyesi, Linköping: Linköping University Electronic Press, 2019, s. 16-23Konferensbidrag (Refereegranskat)
    Abstract [en]

    This study describes a rule-based pseudonymisation system for Swedish clinical text and its evaluation. The pseudonymisation system replaces already tagged Protected Health Information (PHI) with realistic surrogates. There are eight types of manually annotated PHIs in the electronic patient records; personal first and last names, phone numbers, locations, dates, ages and healthcare units. Two evaluators, both computer scientists, one junior and one senior, evaluated whether a set of 98 electronic patients records where pseudonymised or not. Only 3.5 percent of the records were correctly judged as pseudonymised and 1.5 percent of the real ones were wrongly judged as pseudo, giving that in average 91 percent of the pseudonymised records were judged as real.

  • 331.
    Dalianis, Hercules
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Slutrapport KVALPA: Vilka KVaLitetsindikatorer i PAtientjournalens fria text behövs för att kunna mäta kvalitén på vården? Skapandet av en automatisk metod genom maskininlärning2019Rapport (Övrigt vetenskapligt)
    Abstract [sv]

    Detta är en förstudie för att automatiskt hitta kvalitetsindikatorer i den fria texten i elektroniska patientjournaler från Karolinska universitetssjukhuset. Kvalitetsindikatorerna som studerats indikerar urinvägsinfektioner, sepsis, fallskada, trycksår, nutrition och biverkan av läkemedel. En intervjustudie genomfördes för att förstå problematiken, ett regelbaserat system implementerades i programmerings- språket Python. Systemet kallas för KVALPA och använder sig av triggerord och applicerades på 100 patientjournaler från fem olika kliniska enheter. 102 kvalitetsindikatorer hittades varav 26 var negerade och ytterligare hittades genom manuell analys. De negerade indikatorerna visar att det saknas indikatorer på dålig kvalitet, utom i fallet nutrition. Framtida utvecklingar är att utöka triggerlistan med synonymer framtagna automatiskt men också att annotera upp en guldstandard som kan användas för att evaluera precision och täckning av systemet.

  • 332.
    Dalianis, Hercules
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Data- och systemvetenskap, DSV.
    Knutsson, Ola
    KTH, Skolan för datavetenskap och kommunikation (CSC), Numerisk Analys och Datalogi, NADA.
    Cerratto Pargman, Teresa
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Data- och systemvetenskap, DSV.
    Using human language technology to support the handling officers at the Swedish Social Insurance Agency2009Ingår i: Design and Evaluation of e-Government Applications and Services: Proceedings of the 2nd International Workshop on Design and Evaluation of e-Government Applications and Services (DEGAS'2009) in conjunction with INTERACT'2009, Uppsala, Sweden, August 24th 2009., 2009, s. 30-32Konferensbidrag (Refereegranskat)
    Abstract [en]

    The Swedish Social Insurance Agency, (Försäkringskassan) receives 40 000 per month as well as phone calls from the citizens that are handled by almost 500 handling officers. To initiate the process to make their work more efficient we carried out two user-centered design workshops with the handling officers at Försäkringskassan with the objective of finding in what ways human language technology might facilitate their work. One of the outcomes from the workshops was that the handling officers required a support tool for handling and answering e-mails from their customers. Three main requirements were identified namely to find the correct template to be used in the e-mail answers, a support to automatically create templates and finally an automatic e-mail answering function. We will during two years focus on these design challenges within the IMAIL-project.

  • 333.
    Dalianis, Hercules
    et al.
    Dept of Computer and System Sciences, Stockholm Univ, Sweden.
    Rimka, Martin
    Dept of Computer and System Sciences, Stockholm Univ, Sweden.
    Kann, Viggo
    KTH, Skolan för datavetenskap och kommunikation (CSC), Numerisk Analys och Datalogi, NADA.
    Using Uplug and SiteSeeker to construct a cross language search engine for Scandinavian2009Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper presents how we adapted awebsite search engine for cross languageinformation retrieval, using theUplug word alignment tool for parallelcorpora.We first studied the monolingualsearch queries posed by the visitors ofthe website of the Nordic council containingfive different languages. In orderto compare how well different types ofbilingual dictionaries covered the mostcommon queries and terms on the websitewe tried a collection of ordinary bilingualdictionaries, a small manuallyconstructed trilingual dictionary and anautomatically constructed trilingual dictionary,constructed from the news corpusin the website using Uplug. The precisionand recall of the automaticallyconstructed Swedish-English dictionaryusing Uplug were 71 and 93 percent, respectively.We found that precision andrecall increase significantly in sampleswith high word frequency, but we couldnot confirm that POS-tags improve precision.The collection of ordinary dictionaries,consisting of about 200 000words, only cover 41 of the top 100search queries at the website. The automaticallybuilt trilingual dictionary combinedwith the small manually built trilingualdictionary, consisting of about2 300 words, and cover 36 of the topsearch queries.

  • 334.
    Dalianis, Hercules
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT).
    Xing, Haochun
    KTH, Skolan för informations- och kommunikationsteknik (ICT).
    Xin, Z.
    Creating a reusable English-Chinese parallel corpus for bilingual dictionary construction2010Ingår i: Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010, European Language Resources Association (ELRA) , 2010, s. 1700-1705Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper first describes an experiment to construct an English-Chinese parallel corpus, then applying the Uplug word alignment tool on the corpus and finally produce and evaluate an English-Chinese word list. The Stockholm English-Chinese Parallel Corpus (SEC) was created by downloading English-Chinese parallel corpora from a Chinese web site containing law texts that have been manually translated from Chinese to English. The parallel corpus contains 104 563 Chinese characters equivalent to 59 918 Chinese words, and the corresponding English corpus contains 75 766 English words. However Chinese writing does not utilize any delimiters to mark word boundaries so we had to carry out word segmentation as a preprocessing step on the Chinese corpus. Moreover since the parallel corpus is downloaded from Internet the corpus is noisy regarding to alignment between corresponding translated sentences. Therefore we used 60 hours of manually work to align the sentences in the English and Chinese parallel corpus before performing automatic word alignment using Uplug. The word alignment with Uplug was carried out from English to Chinese. Nine respondents evaluated the resulting English-Chinese word list with frequency equal to or above three and we obtained an accuracy of 73.1 percent.

  • 335.
    Dalianis, Hercules
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Östling, RobertStockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.Weegar, RebeckaStockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.Wirén, MatsStockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Special Issue of Selected Contributions from the Seventh Swedish Language Technology Conference (SLTC 2018)2019Proceedings (redaktörskap) (Övrigt vetenskapligt)
    Abstract [en]

    This Special Issue contains three papers that are extended versions of abstracts presented at the Seventh Swedish Language Technology Conference (SLTC 2018), held at Stockholm University 8–9 November 2018.1 SLTC 2018 received 34 submissions, of which 31 were accepted for presentation. The number of registered participants was 113, including both attendees at SLTC 2018 and two co-located workshops that took place on 7 November. 32 participants were internationally affiliated, of which 14 were from outside the Nordic countries. Overall participation was thus on a par with previous editions of SLTC, but international participation was higher.

  • 336.
    Danielsson, Benjamin
    Linköpings universitet, Institutionen för datavetenskap.
    A Study on Text Classification Methods and Text Features2019Självständigt arbete på grundnivå (kandidatexamen), 12 poäng / 18 hpStudentuppsats (Examensarbete)
    Abstract [en]

    When it comes to the task of classification the data used for training is the most crucial part. It follows that how this data is processed and presented for the classifier plays an equally important role. This thesis attempts to investigate the performance of multiple classifiers depending on the features that are used, the type of classes to classify and the optimization of said classifiers. The classifiers of interest are support-vector machines (SMO) and multilayer perceptron (MLP), the features tested are word vector spaces and text complexity measures, along with principal component analysis on the complexity measures. The features are created based on the Stockholm-Umeå-Corpus (SUC) and DigInclude, a dataset containing standard and easy-to-read sentences. For the SUC dataset the classifiers attempted to classify texts into nine different text categories, while for the DigInclude dataset the sentences were classified into either standard or simplified classes. The classification tasks on the DigInclude dataset showed poor performance in all trials. The SUC dataset showed best performance when using SMO in combination with word vector spaces. Comparing the SMO classifier on the text complexity measures when using or not using PCA showed that the performance was largely unchanged between the two, although not using PCA had slightly better performance

    Ladda ner fulltext (pdf)
    fulltext
  • 337.
    Darman, Muhammad Ammar Shadiq AD
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Automatic Bug Report Assignment Using Multilevel Recurrent Neural Networks2018Självständigt arbete på avancerad nivå (masterexamen), 20 poäng / 30 hpStudentuppsats (Examensarbete)
    Abstract [en]

    In any software system or project, a continuous inflow of bug reports is an integral part of its upkeep and development. These bug reports, which could amount to a great number in a large system, are typically handled by several layers of human experts assigning the reports encountered to the corresponding developers. With the advancement in machine learning techniques on document classification, this task could be done automatically with high enough accuracy that the amount of human expert time required would be vastly reduced.In this thesis, we study automatic bug report assignment in the context of the telecom industry. In particular, we study the current state-of-the-art document representation and classification methods applied to bug reports with an emphasis on the usage of word embeddings and multilevel recurrent neural network (RNN). The model we emphasize is a two-level RNN model that incorporates document structure in its design, with the first level consisting of words sequence, representing a sentence and the second level consisting of a sequence of previously mentioned sentence representations, constructing the document representation.A bug report document differs from a general text document in a sense that it often contains boilerplate, software source code, error codes or machine-generated output that could only be understood by the system developers or maintainers and does not conform to common English document rules. This unique nature of the vocabulary with many unrelated symbols could deteriorate the accuracy of the classifiers. Therefore, in addition to document classification, we develop a boilerplate removal system based on stacked generalization ensemble classifier with shallow text features to separate templates, human-generated text and machine-generated text.We conducted our automatic bug report assignment on a sub-collection of eight years of bug reports from our industrial partner. Our experiments show that: (1) The multilevel RNN model performs better than the standard RNN model. (2) Bug report assignment is currently best handled by the stacked generalization ensemble method. (3) Using the Boilerplate removal system to extract only the human-generated text from the bug report documents, various classifiers perform relatively well with only 1/10th of the data in comparison to handcrafted preprocessing rules.

  • 338.
    Darányi, Sandor
    et al.
    Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan.
    Wittek, Peter
    Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan.
    Dobreva, Milena
    Toward a 5M Model of Digital Libraries2010Konferensbidrag (Refereegranskat)
    Abstract [en]

    Whereas the DELOS DRM and the 5S model of digital libraries (DL) addresses the formal side of DL, we argue that a parallel 5M model is emerging as best practice worldwide, integrating multicultural, multilingual, multimodal digital objects with multivariate statistics-based document indexing, categorization and retrieval methods. The fifth M stands for the modeling the information searching behavior of users, and of collection development. We show how an extension of the 5S model to Hilbert space (a) points toward the integration of several Ms; (b) makes the tracking of evolving semantic content feasible, and (c) leads to a field interpretation of word and sentence semantics underlying language change. First experimental results from the Strathprints e-repository verify the mathematical foundations of the 5M model.

    Ladda ner fulltext (pdf)
    FULLTEXT01
  • 339.
    Darányi, Sándor
    Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan.
    Examples of Formulaity in Narratives and Scientific Communication2010Ingår i: Proceedings of the 1st International AMICUS Workshop, October 21, 2010, Vienna, Austria / [ed] Sándor Darányi, Piroska Lendvai, University of Szeged, Hungary , 2010, s. 29-35Konferensbidrag (Refereegranskat)
    Abstract [en]

    The AMICUS project was designed to promote scholarly networking in a topical area, motif recognition in texts, including its automation. Prior to doing so however it is necessary to show the theoretical underpinnings of the research idea. My argument is that evidence from different disciplines amounts to fragmented pieces of a bigger picture. By compiling them like pieces of a puzzle, one can see how the concept of formulaity applies to folklore texts and scholarly communication alike. Regardless of the actual name of the concept (e.g. motif, function, canonical form), what matters is that document parts and whole documents can be characterized by standard sequences of content elements, such formulaic expressions enabling higher-level document indexing and classification by machine learning, plus document retrieval. Information filtering plays a key role in the proposed technology.

    Ladda ner fulltext (pdf)
    FULLTEXT01
  • 340.
    Darányi, Sándor
    et al.
    Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan.
    Forró, László
    Detecting Multiple Motif Co-occurrences in the Aarne-Thompson-Uther Tale Type Catalog: A Preliminary Survey2011Ingår i: Anales de Documentación, ISSN 1575-2437, E-ISSN 1697-7904Artikel i tidskrift (Övrigt vetenskapligt)
    Ladda ner fulltext (pdf)
    FULLTEXT01
  • 341.
    Darányi, Sándor
    et al.
    Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan.
    Forró, László
    Toward Sequencing Multiple Motif Co-Occurrences2011Ingår i: Tanulmányok az örökségmenedzsmentröl 2. Kulturális örökségek kezelése [Studies in Heritage Management 2: The Management of Cultural Heritage]. / [ed] L. Bassa, Információs Társadalomért Alapítvány , 2011, s. 247-260Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    Catalogs project subject field experience onto a multidimensional map which is then converted to a hierarchical list. In the case of the Aarne-Thompson-Uther Tale Type Catalog (ATU), this subject field is the global pattern of tale content defining tale types as canonical motif sequences. To extract and visualize such a map, we considered ATU as a corpus and ana-lysed two segments of it, “Supernatural adversaries” (types 300-399) in particular and “Tales of magic” (types 300-749) in general. The two corpora were scru-tinized for multiple motif co-occurrences and visualized by two-mode clustering of a bag-of-motif co-occurrences matrix. Findings indicate the presence of canonical content units above motif level as well. The organization scheme of folk narratives utilizing motif sequences is reminiscent of nucleotid sequences in the genetic code

    Ladda ner fulltext (pdf)
    FULLTEXT01
  • 342.
    Darányi, Sándor
    et al.
    Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan.
    Lendvai, Piroska
    Proceedings of the First AMICUS Workshop, October 21, 2010 Vienna, Austria2010Samlingsverk (redaktörskap) (Övrigt vetenskapligt)
    Abstract [en]

    In cultural heritage objects, digitized or not, content indicators occurring on higher than word level are often called motifs or their equivalent. Their recognition for document classification and retrieval is largely unresolved. Work on identifying rhetorical, narrative and persuasive elements in scientific texts has been progressing, in several, but largely unconnected tracks. The AMICUS project1 (running between 2009 and 2012) set out to test a possible way to resolve these issues, starting with the identification of Proppian functions in folk tale corpora and adapting the solution to the identification of tale motifs or their functional counterparts. AMICUS has devoted its first project year to listing the corpora, tools, methods and contacts available to address these issues. The initiators of the project have identified a common need in the processing of texts from both the cultural heritage (CH) and scientific communication (SC) domains: to perform automated, large-scale higher-order text analytics, i.e., to reach an advanced level of text understanding so that structured knowledge can be extracted from unstructured text. The four research groups propose to tackle an important aspect of this complex issue by investigating how linguistic elements convey motifs in texts from the CH and the SC domains. Our shared working hypothesis is that the identity of higherorder content-bearing elements, i.e., textual units that are typically designated for e.g. document indexing, classification, enrichment, and the like, strongly depends on community perception.

    Ladda ner fulltext (pdf)
    fulltext
  • 343.
    Darányi, Sándor
    et al.
    Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan.
    Wittek, Peter
    Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan.
    The gravity of meaning: Physics as a metaphor to model semantic changes2012Konferensbidrag (Refereegranskat)
    Abstract [en]

    Based on a computed toy example, we offer evidence that by plugging in similarity of word meaning as a force plus a small modification of Newton’s 2nd law, one can acquire specific “mass” values for index terms in a Saltonesque dynamic library environment. The model can describe two types of change which affect the semantic composition of document collections: the expansion of a corpus due to its update, and fluctuations of the gravitational potential energy field generated by normative language use as an attractor juxtaposed with actual language use yielding time-dependent term frequencies. By the evolving semantic potential of a vocabulary and concatenating the respective term “mass” values, one can model sentences or longer strings of symbols as vector-valued functions. Since the line integral of such functions is used to express the work of a particle in a gravitational field, the work equivalent of strings can be calculated.

  • 344.
    Darányi, Sándor
    et al.
    Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan.
    Wittek, Peter
    Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan.
    Dobreva, Milena
    Using wavelet analysis for text categorization in digital libraries: a first experiment with Strathprints2011Ingår i: International Journal on Digital Libraries, ISSN 1432-5012, E-ISSN 1432-1300Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Digital libraries increasingly bene t from re- search on automated text categorization for improved access. Such research is typically carried out by using standard test collections. In this paper we present a pilot experiment of replacing such test collections by a set of 6000 objects from a real-world digital repos- itory, indexed by Library of Congress Subject Head- ings, and test support vector machines in a supervised learning setting for their ability to reproduce the exist- ing classi cation. To augment the standard approach, we introduce a combination of two novel elements: us- ing functions for document content representation in Hilbert space, and adding extra semantics from lexical resources to the representation. Results suggest that wavelet-based kernels slightly outperformed traditional kernels on classi cation reconstruction from abstracts and vice versa from full-text documents, the latter out- come due to word sense ambiguity. The practical imple- mentation of our methodological framework enhances the analysis and representation of speci c knowledge relevant to large-scale digital collections, in this case the thematic coverage of the collections. Representation of speci c knowledge about digital collections is one of the basic elements of the persistent archives and the less studied one (compared to representations of digital ob- jects and collections). Our research is an initial step in this direction developing further the methodological ap- proach and demonstrating that text categorisation can be applied to analyse the thematic coverage in digital repositories.

    Ladda ner fulltext (pdf)
    FULLTEXT01
  • 345.
    Darányi, Sándor
    et al.
    Högskolan i Borås, Akademin för bibliotek, information, pedagogik och IT.
    Wittek, Peter
    Högskolan i Borås, Akademin för bibliotek, information, pedagogik och IT.
    Konstantinidis, K
    CERTH..
    Papadopoulos, S
    CERTH..
    A Potential Surface Underlying Meaning?2015Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    Machine learning algorithms utilizing gradient descent to identify concepts or more general learnables hint at a so-far ignored possibility, namely that local and global minima represent any vocabulary as a landscape against which evaluation of the results can take place. A simple example to illustrate this idea would be a potential surface underlying gravitation. However, to construct a gravitation-based representation of, e.g., word meaning, only the distance between localized items is a given in the vector space, whereas the equivalents of mass or charge are unknown in semantics. Clearly, the working hypothesis that physical fields could be a useful metaphor to study word and sentence meaning is an option but our current representations are incomplete in this respect.For a starter, consider that an RBF kernel has the capacity to generate a potential surface and hence create the impression of gravity, providing one with distance-based decay of interaction strength, plus a scalar scaling factor for the interaction, but of course no term masses. We are working on an experiment design to change that. Therefore, with certain mechanisms in neural networks that could host such quasi-physical fields, a novel approach to the modeling of mind content seems plausible, subject to scrutiny.Work in progress in another direction of the same idea indicates that by using certain algorithms, already emerged vs. still emerging content is clearly distinguishable, in line with Aristotle’s Metaphysics. The implications are that a model completed by “term mass” or “term charge” would enable the computation of the specific work equivalent of sentences or documents, and that via replacing semantics by other modalities, vector fields of more general symbolic content could exist as well. Also, the perceived hypersurface generated by the dynamics of language use may be a step toward more advanced models, for example addressing the Hamiltonian of expanding semantic systems, or the relationship between reaction paths in quantum chemistry vs. sentence construction by gradient descent.

    Ladda ner fulltext (pdf)
    fulltext
  • 346.
    De Bona, Fabio
    et al.
    Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany.
    Riezler, Stefan
    Hall, Keith
    Ciaramita, Massimiliano
    Herdagdelen, Amac
    University of Trento, Rovereto, Italy.
    Holmqvist, Maria
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Learning dense models of query similarity from user click logs2010Ingår i: HLT '10: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010, s. 474-482Konferensbidrag (Refereegranskat)
  • 347.
    de Lhoneux, Miryam
    et al.
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Bjerva, Johannes
    University of Copenhagen.
    Augenstein, Isabelle
    University of Copenhagen.
    Søgaard, Anders
    University of Copenhagen.
    Parameter sharing between dependency parsers for related languages2018Ingår i: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing / [ed] Association for Computational Linguistics, Brussels, 2018, s. 4992-4997Konferensbidrag (Refereegranskat)
    Abstract [en]

    Previous work has suggested that parameter sharing between transition-based neural dependency parsers for related languages can lead to better performance, but there is no consensus on what parameters to share. We present an evaluation of 27 different parameter sharing strategies across 10 languages, representing five pairs of related languages, each pair from a different language family. We find that sharing transition classifier parameters always helps, whereas the usefulness of sharing word and/or character LSTM parameters varies. Based on this result, we propose an architecture where the transition classifier is shared, and the sharing of word and character parameters is controlled by a parameter that can be tuned on validation data. This model is linguistically motivated and obtains significant improvements over a mono-lingually trained baseline. We also find that sharing transition classifier parameters helps when training a parser on unrelated language pairs, but we find that, in the case of unrelated languages, sharing too many parameters does not help.

  • 348.
    de Lhoneux, Miryam
    et al.
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Stymne, Sara
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Nivre, Joakim
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Arc-Hybrid Non-Projective Dependency Parsing with a Static-Dynamic Oracle2017Ingår i: IWPT 2017 15th International Conference on Parsing Technologies: Proceedings of the Conference, Pisa, Italy: Association for Computational Linguistics, 2017, s. 99-104Konferensbidrag (Refereegranskat)
    Abstract [en]

    We extend the arc-hybrid transition system for dependency parsing with a SWAP transition that enables reordering of the words and construction of non-projective trees. Although this extension potentially breaks the arc-decomposability of the transition system, we show that the existing dynamic oracle can be modified and combined with a static oracle for the SWAP transition. Experiments on five languages with different degrees of non-projectivity show that the new system gives competitive accuracy and is significantly better than a system trained with a purely static oracle.

    Ladda ner fulltext (pdf)
    fulltext
  • 349.
    de Lhoneux, Miryam
    et al.
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Stymne, Sara
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Nivre, Joakim
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    What Should/Do/Can LSTMs Learn When Parsing Auxiliary Verb Constructions?2019Ingår i: CoRR, Vol. abs/1907.07950Artikel i tidskrift (Övrigt vetenskapligt)
  • 350.
    de Lhoneux, Miryam
    et al.
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Yan, Shao
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Basirat, Ali
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Kiperwasser, Eliyahu
    Bar-Ilan University.
    Stymne, Sara
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Goldberg, Yoav
    Bar-Ilan University.
    Nivre, Joakim
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    From raw text to Universal Dependencies: look, no tags!2017Ingår i: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Vancouver, Canada: Association for Computational Linguistics, 2017, s. 207-217Konferensbidrag (Refereegranskat)
    Abstract [en]

    We present the Uppsala submission to the CoNLL 2017 shared task on parsing from raw text to universal dependencies. Our system is a simple pipeline consisting of two components. The first performs joint word and sentence segmentation on raw text; the second predicts dependency trees from raw words. The parser bypasses the need for part-of-speech tagging, but uses word embeddings based on universal tag distributions. We achieved a macroaveraged LAS F1 of 65.11 in the official test run and obtained the 2nd best result for sentence segmentation with a score of 89.03. After fixing two bugs, we obtained an unofficial LAS F1 of 70.49.

    Ladda ner fulltext (pdf)
    fulltext
45678910 301 - 350 av 1846
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf