Change search
ReferencesLink to record
Permanent link

Direct link
Clustering User Behavior in Scientific Collections
Norwegian University of Science and Technology, Faculty of Information Technology, Mathematics and Electrical Engineering, Department of Computer and Information Science.
2014 (English)MasteroppgaveStudent thesis
Abstract [en]

This master thesis looks at how clustering techniques can be applied to a collection of scientific documents. Approximately one year of server logs from the CERN Document Server (CDS) are analyzed and preprocessed. Based on the findings of this analysis, and a review of the current state of the art, three different clustering methods are selected for further work: Simple k-Means, Hierarchical Agglomerative Clustering (HAC) and Graph Partitioning. In addition, a custom, agglomerative clustering algorithm is made in an attempt to tackle some of the problems encountered during the experiments with k-Means and HAC. The results from k-Means and HAC are poor, but the graph partitioning method yields some promising results. The main conclusion of this thesis is that the inherent clusters within the user-record relationship of a scientific collection are nebulous, but existing. Furthermore, the most common clustering algorithms are not suitable for this type of clustering.

Place, publisher, year, edition, pages
Institutt for datateknikk og informasjonsvitenskap , 2014. , 114 p.
Keyword [no]
ntnudaim:12121, MTDT Datateknologi, Data- og informasjonsforvaltning
URN: urn:nbn:no:ntnu:diva-27340Local ID: ntnudaim:12121OAI: diva2:769314
Available from: 2014-12-07 Created: 2014-12-07

Open Access in DiVA

fulltext(1503 kB)355 downloads
File information
File name FULLTEXT01.pdfFile size 1503 kBChecksum SHA-512
Type fulltextMimetype application/pdf
cover(184 kB)0 downloads
File information
File name COVER01.pdfFile size 184 kBChecksum SHA-512
Type coverMimetype application/pdf

By organisation
Department of Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 355 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 35 hits
ReferencesLink to record
Permanent link

Direct link