Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Clustering Semantically Related Questions
Örebro University, School of Science and Technology.
2019 (English)Independent thesis Advanced level (degree of Master (Two Years)), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

There has been a vast increase of users that use the internet in order to communicate and interact, and as a result, the amount of data created follows the same upward trend making data handling overwhelming. Users are often asked to submit their questions on various topics of their interest, and usually, that itself creates an information overload that is difficult to organize and process. This research addresses the problem of extracting information contained in a large set of questions by selecting the most representative ones from the total number of questions asked. The proposed framework attempts to find semantic similarities between questions and group them in clusters. It then selects the most relevant question from each cluster. In this way, the questions selected will be the most representative questions from all the submitted ones. To obtain the semantic similarities between the questions, two sentence embedding approaches, Universal Sentence Encoder (USE) and InferSent, are applied. Moreover, to achieve the clusters, k-means algorithm is used. The framework is evaluated on two large labelled data sets, called SQuAD and House of Commons Written Questions. These data sets include ground truth information that is used to distinctly evaluate the effectiveness of the proposed approach. The results in both data sets show that Universal Sentence Encoder (USE) achieves better outcomes in the produced clusters, which match better with the class labels of the data sets, compared to InferSent.

Place, publisher, year, edition, pages
2019. , p. 74
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:oru:diva-76604OAI: oai:DiVA.org:oru-76604DiVA, id: diva2:1353058
Subject / course
Computer Engineering
Supervisors
Examiners
Available from: 2019-09-20 Created: 2019-09-20 Last updated: 2019-09-20Bibliographically approved

Open Access in DiVA

fulltext(7250 kB)1433 downloads
File information
File name FULLTEXT01.pdfFile size 7250 kBChecksum SHA-512
0636cca910f7693501bce8e8ef34186769cb944e29e61fa42cd0d705954b64d18a20849043f5fec710a5ff7e23790712ce1d34b0d573c429f866692cc2465787
Type fulltextMimetype application/pdf

By organisation
School of Science and Technology
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 1433 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 693 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf