Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Domain Adaptation for Hypernym Discovery via Automatic Collection of Domain-Specific Training Data
Linköping University, Department of Computer and Information Science.
2019 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Domänanpassning för identifiering av hypernymer via automatisk insamling av domänspecifikt träningsdata (Swedish)
Abstract [en]

Identifying semantic relations in natural language text is an important component of many knowledge extraction systems. This thesis studies the task of hypernym discovery, i.e discovering terms that are related by the hypernymy (is-a) relation. Specifically, this thesis explores how state-of-the-art methods for hypernym discovery perform when applied in specific language domains. In recent times, state-of-the-art methods for hypernym discovery are mostly made up by supervised machine learning models that leverage distributional word representations such as word embeddings. These models require labeled training data in the form of term pairs that are known to be related by hypernymy. Such labeled training data is often not available when working with a specific language domain. This thesis presents experiments with an automatic training data collection algorithm. The algorithm leverages a pre-defined domain-specific vocabulary, and the lexical resource WordNet, to extract training pairs automatically. This thesis contributes by presenting experimental results when attempting to leverage such automatically collected domain-specific training data for the purpose of domain adaptation. Experiments are conducted in two different domains: One domain where there is a large amount of text data, and another domain where there is a much smaller amount of text data. Results show that the automatically collected training data has a positive impact on performance in both domains. The performance boost is most significant in the domain with a large amount of text data, with mean average precision increasing by up to 8 points.

Place, publisher, year, edition, pages
2019. , p. 43
Keywords [en]
NLP, natural language processing, domain adaptation, hypernym, hyponym
National Category
Computer Engineering
Identifiers
URN: urn:nbn:se:liu:diva-157693ISRN: LIU-IDA/LITH-EX-A--2019/039--SEOAI: oai:DiVA.org:liu-157693DiVA, id: diva2:1327273
Subject / course
Computer Engineering
Supervisors
Examiners
Available from: 2019-06-19 Created: 2019-06-19 Last updated: 2019-06-19Bibliographically approved

Open Access in DiVA

fulltext(563 kB)25 downloads
File information
File name FULLTEXT01.pdfFile size 563 kBChecksum SHA-512
3a54af858e14f2a643f0bb421e7ae63f91196ad7e70a3192ebf070f5018ae2796de74e9592e19e3e92da720739ef6b2894e4e1dfb95c34205bf1cc7f4c7b89e0
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Palm Myllylä, Johannes
By organisation
Department of Computer and Information Science
Computer Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 25 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 158 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf