Digitala Vetenskapliga Arkivet

Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Clustering Semantically Related Questions
Örebro universitet, Institutionen för naturvetenskap och teknik.
2019 (Engelska)Självständigt arbete på avancerad nivå (masterexamen), 10 poäng / 15 hpStudentuppsats (Examensarbete)
Abstract [en]

There has been a vast increase of users that use the internet in order to communicate and interact, and as a result, the amount of data created follows the same upward trend making data handling overwhelming. Users are often asked to submit their questions on various topics of their interest, and usually, that itself creates an information overload that is difficult to organize and process. This research addresses the problem of extracting information contained in a large set of questions by selecting the most representative ones from the total number of questions asked. The proposed framework attempts to find semantic similarities between questions and group them in clusters. It then selects the most relevant question from each cluster. In this way, the questions selected will be the most representative questions from all the submitted ones. To obtain the semantic similarities between the questions, two sentence embedding approaches, Universal Sentence Encoder (USE) and InferSent, are applied. Moreover, to achieve the clusters, k-means algorithm is used. The framework is evaluated on two large labelled data sets, called SQuAD and House of Commons Written Questions. These data sets include ground truth information that is used to distinctly evaluate the effectiveness of the proposed approach. The results in both data sets show that Universal Sentence Encoder (USE) achieves better outcomes in the produced clusters, which match better with the class labels of the data sets, compared to InferSent.

Ort, förlag, år, upplaga, sidor
2019. , s. 74
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
URN: urn:nbn:se:oru:diva-76604OAI: oai:DiVA.org:oru-76604DiVA, id: diva2:1353058
Ämne / kurs
Datateknik
Handledare
Examinatorer
Tillgänglig från: 2019-09-20 Skapad: 2019-09-20 Senast uppdaterad: 2019-09-20Bibliografiskt granskad

Open Access i DiVA

fulltext(7250 kB)1433 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 7250 kBChecksumma SHA-512
0636cca910f7693501bce8e8ef34186769cb944e29e61fa42cd0d705954b64d18a20849043f5fec710a5ff7e23790712ce1d34b0d573c429f866692cc2465787
Typ fulltextMimetyp application/pdf

Av organisationen
Institutionen för naturvetenskap och teknik
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 1433 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 695 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf