Digitala Vetenskapliga Arkivet

Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Clustering Semantically Related Questions
Örebro universitet, Institutionen för naturvetenskap och teknik.
2019 (engelsk)Independent thesis Advanced level (degree of Master (Two Years)), 10 poäng / 15 hpOppgave
Abstract [en]

There has been a vast increase of users that use the internet in order to communicate and interact, and as a result, the amount of data created follows the same upward trend making data handling overwhelming. Users are often asked to submit their questions on various topics of their interest, and usually, that itself creates an information overload that is difficult to organize and process. This research addresses the problem of extracting information contained in a large set of questions by selecting the most representative ones from the total number of questions asked. The proposed framework attempts to find semantic similarities between questions and group them in clusters. It then selects the most relevant question from each cluster. In this way, the questions selected will be the most representative questions from all the submitted ones. To obtain the semantic similarities between the questions, two sentence embedding approaches, Universal Sentence Encoder (USE) and InferSent, are applied. Moreover, to achieve the clusters, k-means algorithm is used. The framework is evaluated on two large labelled data sets, called SQuAD and House of Commons Written Questions. These data sets include ground truth information that is used to distinctly evaluate the effectiveness of the proposed approach. The results in both data sets show that Universal Sentence Encoder (USE) achieves better outcomes in the produced clusters, which match better with the class labels of the data sets, compared to InferSent.

sted, utgiver, år, opplag, sider
2019. , s. 74
HSV kategori
Identifikatorer
URN: urn:nbn:se:oru:diva-76604OAI: oai:DiVA.org:oru-76604DiVA, id: diva2:1353058
Fag / kurs
Computer Engineering
Veileder
Examiner
Tilgjengelig fra: 2019-09-20 Laget: 2019-09-20 Sist oppdatert: 2019-09-20bibliografisk kontrollert

Open Access i DiVA

fulltext(7250 kB)1433 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 7250 kBChecksum SHA-512
0636cca910f7693501bce8e8ef34186769cb944e29e61fa42cd0d705954b64d18a20849043f5fec710a5ff7e23790712ce1d34b0d573c429f866692cc2465787
Type fulltextMimetype application/pdf

Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 1433 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

urn-nbn

Altmetric

urn-nbn
Totalt: 695 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf