Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A comparative study of word embedding methods for early risk prediction on the Internet
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
2019 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

We built a system to participate in the eRisk 2019 T1 Shared Task. The aim of the task was to evaluate systems for early risk prediction on the internet, in particular to identify users suffering from eating disorders as accurately andquickly as possible given their history of Reddit posts in chronological order. In the controlled settings of this task, we also evaluated the performance of three different word representation methods: random indexing, GloVe, and ELMo.We discuss our system’s performance, also in the light of the scores obtained by other teams in the shared task. Our results show that our two-step learning approach was quite successful, and we obtained good scores on the early risk prediction metric ERDE across the board. Contrary to our expectations, we did not observe a clear-cut advantage of contextualized ELMo vectors over the commonly used and much more light-weight GloVevectors. Our best model in terms of F1 score turned out to be a model with GloVe vectors as input to the text classifier and a multi-layer perceptron as user classifier. The best ERDE scores were obtained by the model with ELMo vectors and a multi-layer perceptron. The model with random indexing vectors hit a good balance between precision and recall in the early processing stages but was eventually surpassed by the models with GloVe and ELMo vectors. We put forward some possible explanations for the observed results, as well as proposing some improvements to our system.

Place, publisher, year, edition, pages
2019. , p. 46
Keywords [en]
machine learning, deep neural networks, eRisk, early risk prediction, mental health
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:uu:diva-385052OAI: oai:DiVA.org:uu-385052DiVA, id: diva2:1322438
External cooperation
Gavagai AB
Educational program
Master Programme in Language Technology
Supervisors
Examiners
Available from: 2019-06-12 Created: 2019-06-10 Last updated: 2019-06-12Bibliographically approved

Open Access in DiVA

fulltext(497 kB)59 downloads
File information
File name FULLTEXT01.pdfFile size 497 kBChecksum SHA-512
e8992931abc20c59c1b61b94e5e61c44a50bd922179da9ec47c0a6fa5e11916fe0c7c898627be38e50478cef1826464ccaec419c39cd8de717841b1cf20bcd53
Type fulltextMimetype application/pdf

By organisation
Department of Linguistics and Philology
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 59 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 222 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf