Change search
ReferencesLink to record
Permanent link

Direct link
Clustering the Web: Comparing Clustering Methods in Swedish
Linköping University, Department of Computer and Information Science. Linköping University, Faculty of Arts and Sciences.ORCID iD: 0000-0002-9271-7687
2013 (English)Independent thesis Basic level (degree of Bachelor), 12 credits / 18 HE creditsStudent thesisAlternative title
Webbklustring : En jämförelse av klustringsmetoder på svenska (Swedish)
Abstract [en]

Clustering -- automatically sorting -- web search results has been the focus of much attention but is by no means a solved problem, and there is little previous work in Swedish. This thesis studies the performance of three clustering algorithms -- k-means, agglomerative hierarchical clustering, and bisecting k-means -- on a total of 32 corpora, as well as whether clustering web search previews, called snippets, instead of full texts can achieve reasonably decent results. Four internal evaluation metrics are used to assess the data. Results indicate that k-means performs worse than the other two algorithms, and that snippets may be good enough to use in an actual product, although there is ample opportunity for further research on both issues; however, results are inconclusive regarding bisecting k-means vis-à-vis agglomerative hierarchical clustering. Stop word and stemmer usage results are not significant, and appear to not affect the clustering by any considerable magnitude.

Place, publisher, year, edition, pages
2013. , 37 p.
Keyword [en]
clustering, web, search results, snippets, k-means, agglomerative hierarchical clustering, bisecting k-means, swedish
National Category
Language Technology (Computational Linguistics) Human Computer Interaction
URN: urn:nbn:se:liu:diva-95228ISRN: LIU-IDA/KOGVET-G--13/025--SEOAI: diva2:635095
Subject / course
Cognitive science programme
Available from: 2013-07-03 Created: 2013-07-02 Last updated: 2013-07-03Bibliographically approved

Open Access in DiVA

fulltext(678 kB)226 downloads
File information
File name FULLTEXT01.pdfFile size 678 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Hinz, Joel
By organisation
Department of Computer and Information ScienceFaculty of Arts and Sciences
Language Technology (Computational Linguistics)Human Computer Interaction

Search outside of DiVA

GoogleGoogle Scholar
Total: 226 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 201 hits
ReferencesLink to record
Permanent link

Direct link