This thesis is dedicated to the use of graph based methods applied to ranking problems on the Web-graph and applications in natural language processing and biology.
Chapter 2-4 of this thesis is about PageRank and its use in the ranking of home pages on the Internet for use in search engines. PageRank is based on the assumption that a web page should be high ranked if it is linked to by many other pages and/or by other important pages. This is modelled as the stationary distribution of a random walk on the Web-graph.
Due to the large size and quick growth of the Internet it is important to be able to calculate this ranking very efficiently. One of the main topics of this thesis is how this can be made more efficiently, mainly by considering specific types of subgraphs and how PageRank can be calculated or updated for those type of graph structures. In particular we will consider the graph partitioned into strongly connected components and how this partitioning can be utilized.
Chapter 5-7 is dedicated to graph based methods and their application to problems in Natural language processing. Specifically given a collection of texts (corpus) we will compare different clustering methods applied to Pharmacovigilance terms (5), graph based models for the identification of semantic relations between biomedical words (6) and modifications of CValue for the annotation of terms in a corpus.
In Chapter 8-9 we look at biological networks and the application of graph centrality measures for the identification of cancer genes. Specifically in (8) we give a review over different centrality measures and their application to finding cancer genes in biological networks and in (9) we look at how well the centrality of vertices in the true network is preserved in networks generated from experimental data.
Västerås: Mälardalen University , 2016.