Digitala Vetenskapliga Arkivet

Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Word Clustering in an Interactive Text Analysis Tool
Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system.
2019 (engelsk)Independent thesis Advanced level (degree of Master (Two Years)), 20 poäng / 30 hpOppgaveAlternativ tittel
Klustring av ord i ett interaktivt textanalysverktyg (svensk)
Abstract [en]

A central operation of users of the text analysis tool Gavagai Explorer is to look through a list of words and arrange them in groups. This thesis explores the use of word clustering to automatically arrange the words in groups intended to help users. A new word clustering algorithm is introduced, which attempts to produce word clusters tailored to be small enough for a user to quickly grasp the common theme of the words. The proposed algorithm computes similarities among words using word embeddings, and clusters them using hierarchical graph clustering. Multiple variants of the algorithm are evaluated in an unsupervised manner by analysing the clusters they produce when applied to 110 data sets previously analysed by users of Gavagai Explorer. A supervised evaluation is performed to compare clusters to the groups of words previously created by users of Gavagai Explorer. Results show that it was possible to choose a set of hyperparameters deemed to perform well across most data sets in the unsupervised evaluation. These hyperparameters also performed among the best on the supervised evaluation. It was concluded that the choice of word embedding and graph clustering algorithm had little impact on the behaviour of the algorithm. Rather, limiting the maximum size of clusters and filtering out similarities between words had a much larger impact on behaviour.

sted, utgiver, år, opplag, sider
2019. , s. 49
Emneord [en]
word clustering, word embedding, distributional semantics, hierarchical clustering, text analytics, language technology, natural language processing, gavagai
HSV kategori
Identifikatorer
URN: urn:nbn:se:liu:diva-157497ISRN: LIU-IDA/LITH-EX-A--19/028--SEOAI: oai:DiVA.org:liu-157497DiVA, id: diva2:1324935
Eksternt samarbeid
Gavagai AB
Fag / kurs
Computer Engineering
Veileder
Examiner
Tilgjengelig fra: 2019-06-14 Laget: 2019-06-14 Sist oppdatert: 2025-02-07bibliografisk kontrollert

Open Access i DiVA

fulltext(670 kB)579 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 670 kBChecksum SHA-512
cf12cdfde68835cd8a5a16478fe77f67406900b60021a41bb6a91643ce131a61a68f6bd6a6ea1a930f528a14068b0341f8449baffd3904d32a9aac9dff42a1bf
Type fulltextMimetype application/pdf

Søk i DiVA

Av forfatter/redaktør
Gränsbo, Gustav
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 579 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

urn-nbn

Altmetric

urn-nbn
Totalt: 1281 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf