Word Space Models for Web User Clustering and Page Prefetching
Independent thesis Basic level (degree of Bachelor), 12 credits / 18 HE creditsStudent thesis
This study evaluates methods for clustering web users via vector space models, for the purpose of web page prefetching for possible applications of server optimization. An experiment using Latent Semantic Analysis (LSA) is deployed to investigate whether LSA can reproduce the encouraging results obtained from previous research with Random Indexing (RI) and a chaos based optimization algorithm (CAS-C). This is not only motivated by LSA being yet another vector space model, but also by a study indicating LSA to outperform RI in a task similar to the web user clustering and prefetching task. The prefetching task was used to verify the applicability of LSA, where both RI and CAS-C have shown promising results. The original data set from the RI web user clustering and prefetching task was modeled using weighted (tf-idf) LSA. Clusters were defined using a common clustering algorithm (k-means). The least scattered cluster configuration for the model was identified by combining an internal validity measure (SSE) and a relative criterion validity measure (SD index). The assumed optimal cluster configuration was used for the web page prefetching task. Precision and recall of the LSA based method is found to be on par with RI and CAS-C, in as much that it solves the web user clustering and web task with similar characteristics as unweighted RI. The hypothesized inherent gains to precision and recall by using LSA was neither confirmed nor conclusively disproved. The effects of different weighting functions for RI are discussed and a number of methodological factors are identified for further research concerning LSA based clustering and prefetching.
Place, publisher, year, edition, pages
2012. , 44 p.
LSA, RI, CAS-C, Clustering, Prefetching, Web mining
Computer Science Language Technology (Computational Linguistics)
IdentifiersURN: urn:nbn:se:liu:diva-82012ISRN: LIU-IDA/KOGVET-G--12/029--SEOAI: oai:DiVA.org:liu-82012DiVA: diva2:557437
Subject / course
Cognitive science programme