Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Boosting Vertex-Cut Partitioning For Streaming Graphs
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
(SICS Swedish ICT)
(SICS Swedish ICT)
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
Show others and affiliations
2016 (English)In: Big Data (BigData Congress), 2016 IEEE International Congress on, IEEE conference proceedings, 2016, p. 1-8Conference paper, Published paper (Refereed)
Abstract [en]

While the algorithms for streaming graph partitioning are proved promising, they fall short of creating timely partitions when applied on large graphs. For example, it takes 415 seconds for a state-of-the-art partitioner to work on a social network graph with 117 millions edges. We introduce an efficient platform for boosting streaming graph partitioning algorithms. Our solution, called HoVerCut, is Horizontally and Vertically scalable. That is, it can run as a multi-threaded process on a single machine, or as a distributed partitioner across multiple machines. Our evaluations, on both real-world and synthetic graphs, show that HoVerCut speeds up the process significantly without degrading the quality of partitioning. For example, HoVerCut partitions the aforementioned social network graph with 117 millions edges in 11 seconds that is about 37 times faster

Place, publisher, year, edition, pages
IEEE conference proceedings, 2016. p. 1-8
Keywords [en]
streaming graph, vertex-cut partitioning, graph partitioning, parallel scalability
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering Computer Systems
Identifiers
URN: urn:nbn:se:kth:diva-189864DOI: 10.1109/BigDataCongress.2016.10ISI: 000390212200001Scopus ID: 2-s2.0-84994558691ISBN: 978-1-5090-2622-7 (print)OAI: oai:DiVA.org:kth-189864DiVA, id: diva2:949508
Conference
5th 2016 IEEE International Congress on Big Data (BigData Congress 2016)
Note

QC 20160923

Available from: 2016-07-20 Created: 2016-07-20 Last updated: 2019-02-08Bibliographically approved
In thesis
1. Towards Unifying Stream Processing over Central and Near-the-Edge Data Centers
Open this publication in new window or tab >>Towards Unifying Stream Processing over Central and Near-the-Edge Data Centers
2016 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

In this thesis, our goal is to enable and achieve effective and efficient real-time stream processing in a geo-distributed infrastructure, by combining the power of central data centers and micro data centers. Our research focus is to address the challenges of distributing the stream processing applications and placing them closer to data sources and sinks. We enable applications to run in a geo-distributed setting and provide solutions for the network-aware placement of distributed stream processing applications across geo-distributed infrastructures.

 First, we evaluate Apache Storm, a widely used open-source distributed stream processing system, in the community network Cloud, as an example of a geo-distributed infrastructure. Our evaluation exposes new requirements for stream processing systems to function in a geo-distributed infrastructure. Second, we propose a solution to facilitate the optimal placement of the stream processing components on geo-distributed infrastructures. We present a novel method for partitioning a geo-distributed infrastructure into a set of computing clusters, each called a micro data center. According to our results, we can increase the minimum available bandwidth in the network and likewise, reduce the average latency to less than 50%. Next, we propose a parallel and distributed graph partitioner, called HoVerCut, for fast partitioning of streaming graphs. Since a lot of data can be presented in the form of graph, graph partitioning can be used to assign the graph elements to different data centers to provide data locality for efficient processing. Last, we provide an approach, called SpanEdge that enables stream processing systems to work on a geo-distributed infrastructure. SpenEdge unifies stream processing over the central and near-the-edge data centers (micro data centers). As a proof of concept, we implement SpanEdge by extending Apache Storm that enables it to run across multiple data centers.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2016. p. 33
Series
TRITA-ICT ; 2016:27
Keywords
geo-distributed stream processing, geo-distributed infrastructure, edge computing, edge-based analytics
National Category
Computer and Information Sciences
Research subject
Information and Communication Technology
Identifiers
urn:nbn:se:kth:diva-193582 (URN)978-91-7729-118-3 (ISBN)
Presentation
2016-11-14, Sal 208, Electrum, Kungl Tekniska högskolan, Kistagången 16, Kista, Stockholm, 13:00 (English)
Opponent
Supervisors
Note

QC 20161005

Available from: 2016-10-05 Created: 2016-10-04 Last updated: 2018-01-14Bibliographically approved
2. Methods and Algorithms for Data-Intensive Computing: Streams, Graphs, and Geo-Distribution
Open this publication in new window or tab >>Methods and Algorithms for Data-Intensive Computing: Streams, Graphs, and Geo-Distribution
2019 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Struggling with the volume and velocity of Big Data has attracted lots of interest towards stream processing paradigm, a paradigm in the area of data-intensive computing that provides methods and solutions to process data in motion. Today's Big Data includes geo-distributed data sources.In addition, a major part of today's Big Data requires exploring complex and evolving relationships among data, which complicates any reasoning on the data. This thesis aims at challenges raised by geo-distributed streaming data, and the data with complex and evolving relationships.

Many organizations provide global scale applications and services that are hosted on servers and data centers that are located in different parts of the world. Therefore, the data that needs to be processed are generated in different geographical locations. This thesis advocates for distributed stream processing in geo-distributed settings to improve the performance including better response time and lower network cost compared to centralized solutions. In this thesis, we conduct an experimental study of Apache Storm, a widely used open-source stream processing system, on a geo-distributed infrastructure made of near-the-edge resources. The resources that host the system's components are connected by heterogeneous network links. Our study exposes a set of issues and bottlenecks of deploying a stream processing system on the geo-distributed infrastructure. Inspired by the results, we propose a novel method for grouping of geo-distributed resources into computing clusters, called micro data centers, in order to mitigate the effect of network heterogeneity for distributed stream processing applications. Next, we focus on the windowed aggregation of geo-distributed data streams, which has been widely used in stream analytics. We propose to reduce the bandwidth cost by coordinating windowed aggregations among near-the-edge data centers. We leverage intra-region links and design a novel low-overhead coordination algorithm that optimizes communication cost for data aggregation. Then, we propose a system, called SpanEdge, that provides an expressive programming model to unify programming stream processing applications on a geo-distributed infrastructure and provides a run-time system to manage (schedule and execute) stream processing applications across data centers. Our results show that SpanEdge can optimally deploy stream processing applications in a geo-distributed infrastructure, which significantly reduces the bandwidth consumption and response latency.

With respect to data with complex and evolving relationships, this thesis aims at effective and efficient processing of inter-connected data. There exist several domains such as social network analysis, machine learning, and web search in which data streams are modeled as linked entities of nodes and edges, namely a graph. Because of the inter-connection among the entities in graph data, processing of graph data is challenging. The inter-connection among the graph entities makes it difficult to distribute the graph among multiple machines to process the graph at scale. Furthermore, in a streaming setting, the graph structure and the graph elements can continuously change as the graph elements are streamed. Such a dynamic graph requires incremental computing methods that can avoid redundant computations on the whole graph. This thesis proposes incremental computing methods of streaming graph processing that can boost the processing time while still obtaining high quality results. In this thesis, we introduce HoVerCut, an efficient framework for boosting streaming graph partitioning algorithms. HoVerCut is Horizontally and Vertically scalable. Our evaluations show that HoVerCut speeds up the partitioning process significantly without degrading the quality of partitioning. Finally, we study unsupervised representation learning in dynamic graphs. Graph representation learning seeks to learn low dimensional vector representations for the graph elements, i.e. edges and vertices, and the whole graph.We propose novel and computationally efficient incremental algorithms. The computation complexity of our algorithms depends on the extent and rate of changes in a graph and on the graph density. The evaluation results show that our proposed algorithms can achieve competitive results to the state-of-the-art static methods while being computationally efficient.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2019. p. 58
Series
TRITA-EECS-AVL ; 2019:13
Keywords
stream processing, geo-distributed infrastructure, edge computing, streaming graph, dynamic graph
National Category
Computer and Information Sciences
Research subject
Information and Communication Technology; Computer Science
Identifiers
urn:nbn:se:kth:diva-243883 (URN)978-91-7873-094-0 (ISBN)
Public defence
2019-03-15, Ka-Sal C (Sal Sven-Olof Öhrvik), Electrum, Kista, Stockholm, 13:30 (English)
Opponent
Supervisors
Note

QC 20190208

Available from: 2019-02-08 Created: 2019-02-08 Last updated: 2019-02-08Bibliographically approved

Open Access in DiVA

Boosting Vertex-Cut Partitioning For Streaming Graphs(745 kB)229 downloads
File information
File name FULLTEXT01.pdfFile size 745 kBChecksum SHA-512
d254ed07c88daf26e3e779938104410a37f20a26ac609b34d7a4d351d834fb675f97c659b49f8d6a7ead6b769c50e820196352424c7aa57331c64256e02d4538
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopushttp://www.ieeebigdata.org/2016/cfp.html

Search in DiVA

By author/editor
Peiro Sajjad, HoomanVlassov, Vladimir
By organisation
Software and Computer systems, SCS
Other Electrical Engineering, Electronic Engineering, Information EngineeringComputer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 229 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 376 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf