Large-Scale User Click Analysis in News Recommendation
The Internet and the World Wide Web have taken over as the standard reading and finding news. This makes it possible for news readers to carefully choose the news that is most interesting for them. Due to the large amounts of articles, it can be a challenging and time consuming task to find the wanted information. Simplifying this process for the news readers would be beneficial.
This thesis explores the idea of filtering out unwanted news articles and serving the useful ones to the reader through mobile platforms. It is part of a bigger project named SmartMedia that focuses on using complex strategies for delivering news to the users. While the overall strategy is based on using the total context of users to serve new, the specific scope of this thesis is creating user profiles from user acts logged by the system. The motivation is to utilize these profiles in cooperation with information filtering techniques to help reach the overall goal.
A big part of this thesis focuses on implementing Hadoop jobs that summarizes the user logs into profiles. In the solution, each user profile consists of two vectors. A category vector that describes the user?s interests in the different news categories and a keyword vector that exploits entities defined in news articles to analyse at a low granularity level. The results are evaluated and discussed at the end.
How to evaluate the effectiveness and accuracy of the user profiles is difficult. Little real data was available during this research and actual data is needed. Data that replicates real users is hard to forge and is needed for both evaluation and calibration of the implementation. Thus, the focus of the discussion is on how to perform these two tasks when the system is deployed.
Place, publisher, year, edition, pages
Institutt for datateknikk og informasjonsvitenskap , 2013. , 85 p.
IdentifiersURN: urn:nbn:no:ntnu:diva-23004Local ID: ntnudaim:9824OAI: oai:DiVA.org:ntnu-23004DiVA: diva2:655638
Gulla, Jon Atle, Professor