Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Unsupervised text clusteringusing survey answers
KTH, School of Engineering Sciences (SCI).
KTH, School of Engineering Sciences (SCI).
2017 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Text data mining is a growing research field where machine learning and NLP areimportant technologies. There are multiple applications concerning categorizinglarge sets of documents. Depending on the size of the documents the methodsdi↵er, when it comes to short text documents the information in individualones are scant. The aim of this paper is to show how well unsupervised textclustering reflects existing class assignments and how sensitive clustering is whencomparing di↵erent text representation and feature selection. The raw datawas collected from several national health surveys. Evaluation was made with aconditional entropy-based method called V-measure which connects the clustersto the categories. We present that some methods perform significantly betteragainst raw data then others.

Place, publisher, year, edition, pages
2017. , 36 p.
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:kth:diva-210853OAI: oai:DiVA.org:kth-210853DiVA: diva2:1120495
Supervisors
Examiners
Available from: 2017-07-06 Created: 2017-07-06 Last updated: 2017-07-06Bibliographically approved

Open Access in DiVA

fulltext(1505 kB)87 downloads
File information
File name FULLTEXT01.pdfFile size 1505 kBChecksum SHA-512
d0d3f7c4626c04ee30bcb366599af70f625416d00fb22b498a1d309a9ae1cee8d9e33f5edc17efc513529e02ac37c9afa7c6fd512aaac5a499de8940f71c6371
Type fulltextMimetype application/pdf

By organisation
School of Engineering Sciences (SCI)
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 87 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 100 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf