Performance Optimization Techniques and Tools for Distributed Graph Processing
2016 (English)Doctoral thesis, monograph (Other academic)
In this thesis, we propose optimization techniques for distributed graph processing. First, we describe a data processing pipeline that leverages an iterative graph algorithm for automatic classification of web trackers. Using this application as a motivating example, we examine how asymmetrical convergence of iterative graph algorithms can be used to reduce the amount of computation and communication in large-scale graph analysis. We propose an optimization framework for fixpoint algorithms and a declarative API for writing fixpoint applications. Our framework uses a cost model to automatically exploit asymmetrical convergence and evaluate execution strategies during runtime. We show that our cost model achieves speedup of up to 1.7x and communication savings of up to 54%. Next, we propose to use the concepts of semi-metricity and the metric backbone to reduce the amount of data that needs to be processed in large-scale graph analysis. We provide a distributed algorithm for computing the metric backbone using the vertex-centric programming model. Using the backbone, we can reduce graph sizes up to 88% and achieve speedup of up to 6.7x.
Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2016. , 158 p.
Graph processing, distributed systems, big data
Research subject Information and Communication Technology
IdentifiersURN: urn:nbn:se:kth:diva-192471ISBN: 78-91-7729-101-5OAI: oai:DiVA.org:kth-192471DiVA: diva2:968786
2016-10-10, Sal C, Electrum, Kista, 13:00 (English)
Vlassov, VladimirSchulte, ChristianHaridi, SeifVan Roy, Peter
QC 201609192016-09-192016-09-122016-09-19Bibliographically approved