Change search
ReferencesLink to record
Permanent link

Direct link
Evaluating recommendation systems for a sparse boolean dataset
KTH, School of Computer Science and Communication (CSC).
2016 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Evaluering av rekommendationssystem för ett glest booleskt dataset (Swedish)
Abstract [en]

Recommendation systems is an area within machine learning that has become increasingly relevant with the expansion of the daily usage of technology. The most popular approaches when making a recommendation system are collaborative filtering and content-based. Collaborative filtering also contains two major sub approaches memory-based and model-based. This thesis will explore both content-based and collaborative filtering to use as a recommendation system on a sparse boolean dataset. For the content-based filtering approach term frequency-inverse document frequency algorithm was implemented. As a memory-based approach K-nearest neighbours method was conducted. For the model-based approach two different algorithms were implemented, singular value decomposition and alter least square. To evaluate, a cross-approach evaluator was used by looking at the recommendations as a search, a search that the users were not aware of. Key values such as the number of test users who could received a recommendation, time consumption, F1 score (precision and recall) and the dataset size were used to compare the methods and reach conclusions.  The finding of the study was that collaborative filtering was the most accurate choice when it comes to sparse datasets. The implemented algorithm for the model-based collaborative filtering that performed most accurate was Singular value decomposition without any regularization against overfitting. A further step of this thesis would be to evaluate the different methods in an online environment with active users, giving feedback in real time.

Abstract [sv]

Rekommendationssystem är ett område inom maskininlärning som har blivit allt vanligare i och med expansionen av den dagliga användningen av teknik. Det mest populära metoder när du gör ett rekommendationssystemet, “collaborative filtering” och “content-based filtering”. Collaborative filtering innehåller också två sub kategorier, “memory-based” och “model-based”. Denna avhandling kommer att undersöka både “content-based” och “collaborative filtering” för användning som ett rekommendationssystem för ett glest boolesk dataset. Som “content-based” strategi implementerades term frekvens omvänd dokument frekvens (TF-IDF) algoritmen. Som en “memory-based” strategi implementerades K-närmast grannarna (K-NN) metoden. För “model-based” angripsättet implementerades två olika algoritmer, singulärvärdesuppdelning (SVD) och altenerande minsta kvadrat metoden (ALS). För att kunna utvärdera metoderna mot varandra sågs rekommendationer som en sökning, en sökning som användarna inte var medvetna om att det gjort. Viktiga värden som antalet testanvändare som kunde fått en rekommendation, tidsåtgång, “F1 score” (precision och recall) och dataset storlek användes för att jämföra det olika metoderna och dra slutsatser. Resultatet av studien visar att “collaborative filtering” var den högst presterande när det gäller en gles datamängd. Den implementerade algoritmen för “model-based collaborative filtering“ som visat sig vara den mest exakta var SVD utan reglering mot “overfitting”. En framtida påbyggnad av denna rapport är att utvärdera olika metoder i en online-miljö med aktiva användare som kan ge respons i realtid.

Place, publisher, year, edition, pages
2016.
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-189850OAI: oai:DiVA.org:kth-189850DiVA: diva2:949210
Subject / course
Computer Science
Educational program
Master of Science in Engineering - Computer Science and Technology
Supervisors
Examiners
Available from: 2016-08-18 Created: 2016-07-18 Last updated: 2016-08-18Bibliographically approved

Open Access in DiVA

fulltext(980 kB)5 downloads
File information
File name FULLTEXT01.pdfFile size 980 kBChecksum SHA-512
b917632531c10cb1429b79516654acec05f0842a1a4d2cc8b54e25b21b0d3f77e5d103af32676cf16193565c2c6a9ece3c15348e304e73c83b475d97ebb00741
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 5 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 11 hits
ReferencesLink to record
Permanent link

Direct link