Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Linguistically Informed Neural Dependency Parsing for Typologically Diverse Languages
Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi. (Computational Linguistics)ORCID-id: 0000-0001-8844-2126
2019 (engelsk)Doktoravhandling, monografi (Annet vitenskapelig)
Abstract [en]

This thesis presents several studies in neural dependency parsing for typologically diverse languages, using treebanks from Universal Dependencies (UD). The focus is on informing models with linguistic knowledge. We first extend a parser to work well on typologically diverse languages, including morphologically complex languages and languages whose treebanks have a high ratio of non-projective sentences, a notorious difficulty in dependency parsing. We propose a general methodology where we sample a representative subset of UD treebanks for parser development and evaluation. Our parser uses recurrent neural networks which construct information sequentially, and we study the incorporation of a recursive neural network layer in our parser. This follows the intuition that language is hierarchical. This layer turns out to be superfluous in our parser and we study its interaction with other parts of the network. We subsequently study transitivity and agreement information learned by our parser for auxiliary verb constructions (AVCs). We suggest that a parser should learn similar information about AVCs as it learns for finite main verbs. This is motivated by work in theoretical dependency grammar. Our parser learns different information about these two if we do not augment it with a recursive layer, but similar information if we do, indicating that there may be benefits from using that layer and we may not yet have found the best way to incorporate it in our parser. We finally investigate polyglot parsing. Training one model for multiple related languages leads to substantial improvements in parsing accuracy over a monolingual baseline. We also study different parameter sharing strategies for related and unrelated languages. Sharing parameters that partially abstract away from word order appears to be beneficial in both cases but sharing parameters that represent words and characters is more beneficial for related than unrelated languages.

sted, utgiver, år, opplag, sider
Uppsala: Acta Universitatis Upsaliensis, 2019. , s. 178
Serie
Studia Linguistica Upsaliensia, ISSN 1652-1366 ; 24
Emneord [en]
Dependency parsing, multilingual NLP, Universal Dependencies, Linguistically informed NLP
HSV kategori
Forskningsprogram
Datorlingvistik
Identifikatorer
URN: urn:nbn:se:uu:diva-394133ISBN: 978-91-513-0767-1 (tryckt)OAI: oai:DiVA.org:uu-394133DiVA, id: diva2:1357373
Disputas
2019-11-25, Bertil Hammer, Blåsenhus, von Kraemers Allé 1, Uppsala, 13:15 (engelsk)
Opponent
Veileder
Tilgjengelig fra: 2019-10-28 Laget: 2019-10-03 Sist oppdatert: 2019-11-12

Open Access i DiVA

fulltext(1299 kB)286 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 1299 kBChecksum SHA-512
24fef4fcc9436b53dfea47284dcda7e18282f5f41454f668606297392ef37f9105e09568998a291378882207fe599cb2349eb5e65e0844fd6739055611e74f00
Type fulltextMimetype application/pdf
Kjøp publikasjonen >>

Søk i DiVA

Av forfatter/redaktør
de Lhoneux, Miryam
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 286 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

isbn
urn-nbn

Altmetric

isbn
urn-nbn
Totalt: 1827 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf