Change search
ReferencesLink to record
Permanent link

Direct link
PageRank in Evolving Networks and Applications of Graphs in Natural Language Processing and Biology
Mälardalen University, School of Education, Culture and Communication, Educational Sciences and Mathematics. (MAM)ORCID iD: 0000-0002-1624-5147
2016 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

This thesis is dedicated to the use of graph based methods applied to ranking problems on the Web-graph and applications in natural language processing and biology.

Chapter 2-4 of this thesis is about PageRank and its use in the ranking of home pages on the Internet for use in search engines. PageRank is based on the assumption that a web page should be high ranked if it is linked to by many other pages and/or by other important pages. This is modelled as the stationary distribution of a random walk on the Web-graph.

Due to the large size and quick growth of the Internet it is important to be able to calculate this ranking very efficiently. One of the main topics of this thesis is how this can be made more efficiently, mainly by considering specific types of subgraphs and how PageRank can be calculated or updated for those type of graph structures. In particular we will consider the graph partitioned into strongly connected components and how this partitioning can be utilized.

Chapter 5-7 is dedicated to graph based methods and their application to problems in Natural language processing. Specifically given a collection of texts (corpus) we will compare different clustering methods applied to Pharmacovigilance terms (5), graph based models for the identification of semantic relations between biomedical words (6) and modifications of CValue for the annotation of terms in a corpus.

In Chapter 8-9 we look at biological networks and the application of graph centrality measures for the identification of cancer genes. Specifically in (8) we give a review over different centrality measures and their application to finding cancer genes in biological networks and in (9) we look at how well the centrality of vertices in the true network is preserved in networks generated from experimental data.

Place, publisher, year, edition, pages
Västerås: Mälardalen University , 2016.
Series
Mälardalen University Press Dissertations, ISSN 1651-4238 ; 217
National Category
Mathematics
Research subject
Mathematics/Applied Mathematics
Identifiers
URN: urn:nbn:se:mdh:diva-33459ISBN: 978-91-7485-298-1OAI: oai:DiVA.org:mdh-33459DiVA: diva2:1039463
Public defence
2016-12-08, Kappa, Mälardalens högskola, Västerås, 13:15 (English)
Opponent
Supervisors
Available from: 2016-10-24 Created: 2016-10-24 Last updated: 2016-11-23Bibliographically approved
List of papers
1. Non-normalized PageRank and random walks on N-partite graphs
Open this publication in new window or tab >>Non-normalized PageRank and random walks on N-partite graphs
2014 (English)In: SMTDA 2014 Proceedings / [ed] H. Skiadas (Ed), 2014, 193-202 p.Conference paper (Refereed)
Abstract [en]

In this article we will look at a variation of the PageRank algorithmoriginally used by L. Page and S. Brin to rank home pages on the Web. Wewill look at a non-normalized variation of PageRank and show how this version ofPageRank relates to a random walk on a graph. The article has its main focus inunderstanding the behavior of the ranking depending on the structure of the graphand how the ranking changes as the graph change. More specic we will look atN-partite graphs and see that by considering a random walk on the graph we cannd explicit formulas for PageRank of the vertices in the graph. Both the case withuniform and non-uniform personalization vector are considered.

Keyword
PageRank, N-partite graph, random walk.
National Category
Mathematics Computational Mathematics
Research subject
Mathematics/Applied Mathematics
Identifiers
urn:nbn:se:mdh:diva-30003 (URN)
Conference
3rd Stochastic Modelling Techniques and Data Analysis International Conference (SMTDA 2014), 11-14 June 2014, Lisbon, Portugal
Available from: 2015-12-18 Created: 2015-12-18 Last updated: 2016-10-24Bibliographically approved
2. PageRank, a Look at Small Changes in a Line of Nodes and the Complete Graph
Open this publication in new window or tab >>PageRank, a Look at Small Changes in a Line of Nodes and the Complete Graph
2016 (English)In: Engineering Mathematics II: Algebraic, Stochastic and Analysis Structures for Networks, Data Classification and Optimization / [ed] Sergei Silvestrov; Milica Rancic, Springer, 2016Chapter in book (Refereed)
Abstract [en]

In this article we will look at the PageRank algorithm used as part of the ranking process of different Internet pages in search engines by for example Google. This article has its main focus in the understanding of the behavior of PageRank as the system dynamically changes either by contracting or expanding such as when adding or subtracting nodes or links or groups of nodes or links. In particular we will take a look at link structures consisting of a line of nodes or a complete graph where every node links to all others. We will look at PageRank as the solution of a linear system of equations and do our examination in both the ordinary normalized version of PageRank as well as the non-normalized version found by solving corresponding linear system. We will show that using two different methods we can find explicit formulas for the PageRank of some simple link structures.

Place, publisher, year, edition, pages
Springer, 2016
Series
, Springer Proceedings in Mathematics and Statistics, ISSN 2194-1009 ; 179
Keyword
PageRank algorithm, graphs, linear system
National Category
Computational Mathematics Discrete Mathematics Probability Theory and Statistics
Research subject
Mathematics/Applied Mathematics
Identifiers
urn:nbn:se:mdh:diva-16355 (URN)10.1007/978-3-319-42105-6 (DOI)978-3-319-42104-9 (ISBN)978-3-319-42105-6 (ISBN)
Available from: 2016-10-11 Created: 2012-12-02 Last updated: 2016-12-05Bibliographically approved
3. PageRank, Connecting a Line of Nodes with a Complete Graph
Open this publication in new window or tab >>PageRank, Connecting a Line of Nodes with a Complete Graph
2016 (English)In: Engineering Mathematics II: Algebraic, Stochastic and Analysis Structures for Networks, Data Classification and Optimization / [ed] Sergei Silvestrov; Milica Rancic, Springer, 2016Chapter in book (Refereed)
Abstract [en]

The focus of this article is the PageRank algorithm originally defined by S. Brin and L. Page as the stationary distribution of a certain random walk on a graph used to rank homepages on the Internet. We will attempt to get a better understanding of how PageRank changes after you make some changes to the graph such as adding or removing edge between otherwise disjoint subgraphs. In particular we will take a look at link structures consisting of a line of nodes or a complete graph where every node links to all others and different ways to combine the two. Both the ordinary normalized version of PageRank as well as a non-normalized version of PageRank found by solving corresponding linear system will be considered. We will see that it is possible to find explicit formulas for the PageRank in some simple link structures and using these formulas take a more in-depth look at the behavior of the ranking as the system changes.

Place, publisher, year, edition, pages
Springer, 2016
Series
, Springer Proceedings in Mathematics and Statistics, ISSN 2194-1009 ; 179
Keyword
PageRank, random walk, graph, linear system, subgraph
National Category
Computational Mathematics Discrete Mathematics Probability Theory and Statistics
Research subject
Mathematics/Applied Mathematics
Identifiers
urn:nbn:se:mdh:diva-33379 (URN)10.1007/978-3-319-42105-6 (DOI)978-3-319-42104-9 (ISBN)978-3-319-42105-6 (ISBN)
Available from: 2016-10-11 Created: 2016-10-11 Last updated: 2016-12-05Bibliographically approved
4. A componentwise PageRank algorithm
Open this publication in new window or tab >>A componentwise PageRank algorithm
2015 (English)In: ASMDA 2015 Proceedings: 16th Applied Stochastic Models and Data Analysis International Conference with 4th Demographics 2015 Workshop / [ed] Christos H Skiadas, ISAST: International Society for the Advancement of Science and Technology , 2015, 185-198 p.Conference paper (Refereed)
Abstract [en]

In this article we will take a look at a variant of the PageRank algorithminitially used by S. Brinn and L. Page to rank homepages on the Internet. The aim ofthe article is to see how we can use the topological structure of the graph to speed upcalculations of PageRank without doing any additional approximations. We will seethat by considering a non-normalized version of PageRank it is easy to see how wecan handle dierent types of vertices or strongly connected components in the graphmore eciently. Using this we propose two PageRank algorithms, one similar to theLumping algorithm proposed by Qing et al which handles certain types of verticesfaster and last another PageRank algorithm which can handle more types of verticesas well as strongly connected components more eectively. In the last sections we willlook at some specic types of components as well as verifying the time complexity ofthe algorithm.

Place, publisher, year, edition, pages
ISAST: International Society for the Advancement of Science and Technology, 2015
Keyword
PageRank, strongly connected component, random walk .
National Category
Mathematics Computational Mathematics
Research subject
Mathematics/Applied Mathematics
Identifiers
urn:nbn:se:mdh:diva-30004 (URN)978-618-5180-05-8 (ISBN)
Conference
16th Applied Stochastic Models and Data Analysis International Conference (ASMDA2015) with Demographics 2015 Workshop, 30 June – 4 July 2015, University of Piraeus, Greece
Available from: 2015-12-18 Created: 2015-12-18 Last updated: 2016-10-24Bibliographically approved
5. Using graph partitioning to calculate PageRank in a changing network
Open this publication in new window or tab >>Using graph partitioning to calculate PageRank in a changing network
(English)Manuscript (preprint) (Other academic)
Abstract [en]

PageRank was first defined by S. Brinn and L. Page in 1998 in order to rank homepages on the Internet by ranking pages according to the stationary distribution of a random walk on the web graph. While the original way to calculate PageRank is fast, due to the huge size and growth of the web there have been many attempts at improving upon the calculation speed of PageRank through various means. In this article we will look at a slightly different but equally important problem, namely how to improve the calculation of PageRank in a changing network where PageRank of an earlier stage of the network is available. In particular we consider two types of changes in the graph, the change in rank after changing the personalization vector used in calculating PageRank as well as added or removed edges between different strongly connected components in the network.

Keyword
PageRank, random walk, strongly connected component, network
National Category
Computational Mathematics
Research subject
Mathematics/Applied Mathematics
Identifiers
urn:nbn:se:mdh:diva-33456 (URN)
Note

To appear in 4thStochastic Modeling Techniques and Data Analysis International Conference (SMTDA2016)

Available from: 2016-10-24 Created: 2016-10-24 Last updated: 2016-10-24
6. Using graph partitioning to calculate PageRank in a changing network
Open this publication in new window or tab >>Using graph partitioning to calculate PageRank in a changing network
(English)Manuscript (preprint) (Other academic)
Abstract [en]

PageRank was first defined by S. Brinn and L. Page in 1998 in order to rank homepages on the Internet by ranking pages according to the stationary distribution of a random walk on the web graph. While the original way to calculate PageRank is fast, due to the huge size and growth of the web there have been many attempts at improving upon the calculation speed of PageRank through various means. In this article we will look at a slightly different but equally important problem, namely how to improve the calculation of PageRank in a changing network where PageRank of an earlier stage of the network is available. In particular we consider two types of changes in the graph, the change in rank after changing the personalization vector used in calculating PageRank as well as added or removed edges between different strongly connected components in the network.

Keyword
PageRank, random walk, strongly connected component, network
National Category
Computational Mathematics
Research subject
Mathematics/Applied Mathematics
Identifiers
urn:nbn:se:mdh:diva-33456 (URN)
Note

To appear in 4thStochastic Modeling Techniques and Data Analysis International Conference (SMTDA2016)

Available from: 2016-10-24 Created: 2016-10-24 Last updated: 2016-10-24
7. Calculating PageRank in a Changing Network With Added or Removed Edges
Open this publication in new window or tab >>Calculating PageRank in a Changing Network With Added or Removed Edges
(English)Manuscript (preprint) (Other academic)
Abstract [en]

PageRank was initially developed by S. Brinn and L. Page in 1998 to rank homepages on the Internet using the stationary distribution of a Markov chain created using the web graph. Due to the large size of the web graph and many other real worldnetworks fast methods to calculate PageRank is needed and even if the original way of calculating PageRank using a Power iterations is rather fast, many other approaches have been made to improve the speed further. In this paper we will consider the problem of recalculating PageRank of a changing network where the PageRank of a previous version of the network is known. In particular we will consider the special case of adding or removing edges to a single vertex in the graph or graph component

Keyword
PageRank, Random walk, graph
National Category
Computational Mathematics
Research subject
Mathematics/Applied Mathematics
Identifiers
urn:nbn:se:mdh:diva-33457 (URN)
Note

To appear in Proceeding sof ICNPAA2016

Available from: 2016-10-24 Created: 2016-10-24 Last updated: 2016-10-24
8. Comparison of Clustering Approaches through Their Application to Pharmacovigilance Terms
Open this publication in new window or tab >>Comparison of Clustering Approaches through Their Application to Pharmacovigilance Terms
Show others...
2013 (English)In: Artificial Intelligence in Medicine. Lecture Notes in Computer Science, vol. 7885 / [ed] Niels Peek, Roque Marín Morales, Mor Peleg, Berlin Heidelberg: Springer, 2013, 58-67 p.Chapter in book (Refereed)
Abstract [en]

In different applications (i.e., information retrieval, filteringor analysis), it is useful to detect similar terms and to provide the possibilityto use them jointly. Clustering of terms is one of the methods whichcan be exploited for this. In our study, we propose to test three methodsdedicated to the clustering of terms (hierarchical ascendant classification,Radius and maximum), to combine them with the semantic distance algorithmsand to compare them through the results they provide whenapplied to terms from the pharmacovigilance area. The comparison indicatesthat the non disjoint clustering (Radius and maximum) outperformthe disjoint clusters by 10 to up to 20 points in all the experiments.

Place, publisher, year, edition, pages
Berlin Heidelberg: Springer, 2013
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 7885
National Category
Language Technology (Computational Linguistics) Computational Mathematics Probability Theory and Statistics Other Mathematics
Research subject
Mathematics/Applied Mathematics
Identifiers
urn:nbn:se:mdh:diva-21680 (URN)10.1007/978-3-642-38326-7_9 (DOI)2-s2.0-84887296545 (ScopusID)978-3-642-38325-0 (ISBN)978-3-642-38326-7 (ISBN)
Conference
14th Conference on Artificial Intelligence in Medicine
Note

14th Conference on Artificial Intelligence in Medicine, AIME 2013; Murcia; Spain; 29 May 2013 through 1 June 2013

Available from: 2013-10-04 Created: 2013-09-27 Last updated: 2016-10-24Bibliographically approved
9. Combining Compositionality and Pagerank for the Identification of Semantic Relations between Biomedical Words
Open this publication in new window or tab >>Combining Compositionality and Pagerank for the Identification of Semantic Relations between Biomedical Words
Show others...
2012 (English)In: BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing, 2012, 109-117 p.Conference paper (Refereed)
Abstract [en]

The acquisition of semantic resources and relations is an important task for several applications, such as query expansion, information retrieval and extraction, machine translation. However, their validity should also be computed and indicated, especially for automatic systems and applications. We exploit the compositionality based methods for the acquisition of synonymy relations and of indicators of these synonyms. We then apply pagerank-derived algorithm to the obtained semantic graph in order to filter out the acquired synonyms. Evaluation performed with two independent experts indicates that the quality of synonyms is systematically improved by 10 to 15% after their filtering.

National Category
Algebra and Logic Bioinformatics (Computational Biology) Other Computer and Information Science Language Technology (Computational Linguistics) Computational Mathematics
Research subject
Mathematics/Applied Mathematics; Computer Science
Identifiers
urn:nbn:se:mdh:diva-17938 (URN)978-1-937284-20-6 (ISBN)1-937284-20-4 (ISBN)
Conference
Workshop on Biomedical Natural Language Processing (BioNLP 2012), Montreal, Canada, June 8, 2012
Available from: 2013-01-17 Created: 2013-01-17 Last updated: 2016-10-24Bibliographically approved
10. Term Ranking Adaptation to the Domain: Genetic Algorithm-Based Optimisation of the C-Value
Open this publication in new window or tab >>Term Ranking Adaptation to the Domain: Genetic Algorithm-Based Optimisation of the C-Value
2014 (English)In: Advances in Natural Language Processing: 9th International Conference on NLP, PolTAL 2014, Warsaw, Poland, September 17-19, 2014. Proceedings, Springer International Publishing , 2014, Vol. 8686, 71-83 p.Conference paper (Refereed)
Abstract [en]

Term extraction methods based on linguistic rules have been proposed to help the terminology building from corpora. As they face the difficulty of identifying the relevant terms among the noun phrases extracted, statistical measures have been proposed. However, the term selection results may depend on corpus and strong assumptions reflecting specific terminological practice. We tackle this problem by proposing a parametrised C-Value which optimally considers the length and the syntactic roles of the nested terms thanks to a genetic algorithm. We compare its impact on the ranking of terms extracted from three corpora. Results show average precision increased by 9% above the frequency-based ranking and by 12% above the C-Value-based ranking.

Place, publisher, year, edition, pages
Springer International Publishing, 2014
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 8686
Series
, Lecture Notes in Computer Science, 8686
Keyword
Terminology; term extraction; term ranking; genetic algorithm
National Category
Mathematics Computational Mathematics Other Mathematics
Research subject
Mathematics/Applied Mathematics
Identifiers
urn:nbn:se:mdh:diva-27266 (URN)10.1007/978-3-319-10888-9_8 (DOI)2-s2.0-84921774624 (ScopusID)978-3-319-10888-9 (ISBN)
Conference
9th International Conference on NLP, PolTAL 2014, Warsaw, Poland, September 17-19, 2014.
Available from: 2015-01-02 Created: 2015-01-02 Last updated: 2016-10-24Bibliographically approved
11. Graph Centrality Based Prediction of Cancer Genes
Open this publication in new window or tab >>Graph Centrality Based Prediction of Cancer Genes
Show others...
2016 (English)In: Engineering Mathematics II: Algebraic, Stochastic and Analysis Structures for Networks, Data Classification and Optimization / [ed] Sergei Silvestrov; Milica Rancic, Springer, 2016Chapter in book (Refereed)
Abstract [en]

Current cancer therapies including surgery, radiotherapy and chemotherapy are often plagued by high failure rates. Designing more targeted and personalized treatment strategies requires a detailed understanding of druggable tumor drivergenes. As a consequence, the detection of cancer driver genes has evolved to a critical scientific field integrating both high-through put experimental screens as well as computational and statistical strategies. Among such approaches, network based prediction tools have recently been accentuated and received major focus due to their potential to model various aspects of the role of cancer genes in a biological system. In this chapter, we focus on how graph centralities obtained from biological networks have been used to predict cancer genes. Specifically, we start by discussing the current problems in cancer therapy and the reasoning behind using network based cancer gene prediction, followed by an outline of biological networks, their generation and properties. Finally, we review major concepts, recent results as well as future challenges regarding the use of graph centralities in cancer gene prediction.

Place, publisher, year, edition, pages
Springer, 2016
Series
, Springer Proceedings in Mathematics and Statistics, ISSN 2194-1009 ; 179
Keyword
graph, graph centrality, biological networks, cancer therapies, cancer driver genes, biological system
National Category
Computational Mathematics Bioinformatics (Computational Biology) Bioinformatics and Systems Biology Genetics Medical and Health Sciences
Research subject
Mathematics/Applied Mathematics
Identifiers
urn:nbn:se:mdh:diva-33381 (URN)10.1007/978-3-319-42105-6 (DOI)978-3-319-42104-9 (ISBN)978-3-319-42105-6 (ISBN)
Available from: 2016-10-11 Created: 2016-10-11 Last updated: 2016-12-05Bibliographically approved
12. Loss of conservation of graph centralities in reverse-engineered transcriptional regulatory networks
Open this publication in new window or tab >>Loss of conservation of graph centralities in reverse-engineered transcriptional regulatory networks
Show others...
2015 (English)In: ASMDA 2015 Proceedings: 16th Applied Stochastic Models and Data Analysis International ConferenceWith 4th Demographics 2015 Workshop / [ed] Christos H Skiadas, ISAST: International Society for the Advancement of Science and Technology , 2015, 1077-1091 p.Conference paper (Refereed)
Abstract [en]

Graph centralities are often used to prioritize disease genes in transcrip-tional regulatory networks. Studies on small networks of experimentally validatedinteractions emphasize the general validity of this approach and extensions of suchndings have recently also been proposed for networks inferred from expression data.However, due to the noise inherent to expression data, it is largely unknown howwell centralities are preserved in such networks. Specically, while previous stud-ies have evaluated the performance of inference methods on synthetic expression, ithas yet to be established how the choice of method can aect individual centralitiesin the network. Here we compare two centralities between reference networks andnetworks inferred from corresponding simulated expression data using a number ofrelated methods. The results indicate that there exists only a modest conservationof centrality measures for the used inference methods. In conclusion, caution shouldbe exercised when inspecting centralities in reverse-engineered networks and furtherwork will be required to establish the use of such networks for prioritizing genes.

Place, publisher, year, edition, pages
ISAST: International Society for the Advancement of Science and Technology, 2015
Keyword
Transcriptional network inference, simulated expression, centrality.
National Category
Bioinformatics and Systems Biology Mathematics Computational Mathematics
Research subject
Mathematics/Applied Mathematics
Identifiers
urn:nbn:se:mdh:diva-30005 (URN)978-618-5180-05-8 (ISBN)
Conference
16th Applied Stochastic Models and Data Analysis International Conference (ASMDA2015) with Demographics 2015 Workshop, 30 June – 4 July 2015, University of Piraeus, Greece
Available from: 2015-12-18 Created: 2015-12-18 Last updated: 2016-10-24Bibliographically approved

Open Access in DiVA

fulltext(557 kB)7 downloads
File information
File name FULLTEXT03.pdfFile size 557 kBChecksum SHA-512
fb8679c02f5e0137846a29b9005848bb8b5113abae277dc12f6bdcc325e087561ac1ef81e266959b4521a0f5c123adc7ffbb5a61eccf968d376835c529150f1d
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Engström, Christopher
By organisation
Educational Sciences and Mathematics
Mathematics

Search outside of DiVA

GoogleGoogle Scholar
Total: 24 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 207 hits
ReferencesLink to record
Permanent link

Direct link