Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Towards Unifying Stream Processing over Central and Near-the-Edge Data Centers
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
2016 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

In this thesis, our goal is to enable and achieve effective and efficient real-time stream processing in a geo-distributed infrastructure, by combining the power of central data centers and micro data centers. Our research focus is to address the challenges of distributing the stream processing applications and placing them closer to data sources and sinks. We enable applications to run in a geo-distributed setting and provide solutions for the network-aware placement of distributed stream processing applications across geo-distributed infrastructures.

 First, we evaluate Apache Storm, a widely used open-source distributed stream processing system, in the community network Cloud, as an example of a geo-distributed infrastructure. Our evaluation exposes new requirements for stream processing systems to function in a geo-distributed infrastructure. Second, we propose a solution to facilitate the optimal placement of the stream processing components on geo-distributed infrastructures. We present a novel method for partitioning a geo-distributed infrastructure into a set of computing clusters, each called a micro data center. According to our results, we can increase the minimum available bandwidth in the network and likewise, reduce the average latency to less than 50%. Next, we propose a parallel and distributed graph partitioner, called HoVerCut, for fast partitioning of streaming graphs. Since a lot of data can be presented in the form of graph, graph partitioning can be used to assign the graph elements to different data centers to provide data locality for efficient processing. Last, we provide an approach, called SpanEdge that enables stream processing systems to work on a geo-distributed infrastructure. SpenEdge unifies stream processing over the central and near-the-edge data centers (micro data centers). As a proof of concept, we implement SpanEdge by extending Apache Storm that enables it to run across multiple data centers.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2016. , 33 p.
Series
TRITA-ICT, 2016:27
Keyword [en]
geo-distributed stream processing, geo-distributed infrastructure, edge computing, edge-based analytics
National Category
Computer and Information Science
Research subject
Information and Communication Technology
Identifiers
URN: urn:nbn:se:kth:diva-193582ISBN: 978-91-7729-118-3 (print)OAI: oai:DiVA.org:kth-193582DiVA: diva2:1026464
Presentation
2016-11-14, Sal 208, Electrum, Kungl Tekniska högskolan, Kistagången 16, Kista, Stockholm, 13:00 (English)
Opponent
Supervisors
Note

QC 20161005

Available from: 2016-10-05 Created: 2016-10-04 Last updated: 2016-10-12Bibliographically approved
List of papers
1. Stream Processing in Community Network Clouds
Open this publication in new window or tab >>Stream Processing in Community Network Clouds
2015 (English)In: Future Internet of Things and Cloud (FiCloud), 2015 3rd International Conference on, IEEE conference proceedings, 2015, 800-805 p.Conference paper, Published paper (Refereed)
Abstract [en]

Community Network Cloud is an emerging distributed cloud infrastructure that is built on top of a community network. The infrastructure consists of a number of geographically distributed compute and storage resources, contributed by community members, that are linked together through the community network. Stream processing is an important enabling technology that, if provided in a Community Network Cloud, would enable a new class of applications, such as social analysis, anomaly detection, and smart home power management. However, modern stream processing engines are designed to be used inside a data center, where servers communicate over a fast and reliable network. In this work, we evaluate the Apache Storm stream processing framework in an emulated Community Network Cloud in order to identify the challenges and bottlenecks that exist in the current implementation. The community network emulation was performed using data collected from the Guifi.net community network, Spain. Our evaluation results show that, with proper configuration of the heartbeats, it is possible to run Apache Storm in a Community Network Cloud. The performance is sensitive to the placement of the Storm components in the network. The deployment of management components on wellconnected nodes improves the Storm topology scheduling time, fault tolerance, and recovery time. Our evaluation also indicates that the Storm scheduler and the stream groupings need to be aware of the network topology and location of stream sources in order to optimally place Storm spouts and bolts to improve performance.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2015
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-174851 (URN)10.1109/FiCloud.2015.95 (DOI)000378639200122 ()2-s2.0-84959041836 (Scopus ID)
Conference
The 4th International Workshop on Community Networks and Bottom-up-Broadband(CNBuB 2015), 24-26 Aug. 2015, Rome, Italy
Note

QC 20151113

Available from: 2015-10-07 Created: 2015-10-07 Last updated: 2016-10-04Bibliographically approved
2. Smart Partitioning of Geo-Distributed Resources to Improve Cloud Network Performance
Open this publication in new window or tab >>Smart Partitioning of Geo-Distributed Resources to Improve Cloud Network Performance
2015 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Cloud Computing systems with geo-distributed re- sources are becoming more popular for enabling a new family of applications, which are latency sensitive or bandwidth intensive, e.g., Internet of Things and online video gaming services. The approach is to host the cloud services at the network edges to reduce the latency and bandwidth consumption. However, the topology of the existing networks is not necessarily optimal for hosting Cloud services. Moreover, how the services are placed on the nodes, can affect the performance of the applications and the whole network. Therefore, we propose a novel algorithm to partition a distributed infrastructure into a set of computing clusters, each called a Micro Data Center. Our proposed algorithm is a decentralized community detection algorithm that does not require any global knowledge of the network topology. We compare our solution with a geolocation based clustering solution and demonstrate our preliminary results based on a real world network data set. We show that micro data centers increase the minimum available bandwidth in the network to up to 62%. Likewise, the average latency can be reduced to 50%.

Keyword
geo-distributed cloud; community detection; cloud network performance; multiple data centers
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-174381 (URN)10.1109/CloudNet.2015.7335292 (DOI)000377207000022 ()2-s2.0-84960981479 (Scopus ID)
Conference
The 2015 4th IEEE International Conference on Cloud Networking (IEEE CloudNet 2015)5-7 October 2015, Niagara Falls, Canada
Note

QC 20160616

Available from: 2015-10-06 Created: 2015-10-06 Last updated: 2016-10-04Bibliographically approved
3. Boosting Vertex-Cut Partitioning For Streaming Graphs
Open this publication in new window or tab >>Boosting Vertex-Cut Partitioning For Streaming Graphs
Show others...
2016 (English)In: Big Data (BigData Congress), 2016 IEEE International Congress on, IEEE conference proceedings, 2016, 1-8 p.Conference paper, Published paper (Refereed)
Abstract [en]

While the algorithms for streaming graph partitioning are proved promising, they fall short of creating timely partitions when applied on large graphs. For example, it takes 415 seconds for a state-of-the-art partitioner to work on a social network graph with 117 millions edges. We introduce an efficient platform for boosting streaming graph partitioning algorithms. Our solution, called HoVerCut, is Horizontally and Vertically scalable. That is, it can run as a multi-threaded process on a single machine, or as a distributed partitioner across multiple machines. Our evaluations, on both real-world and synthetic graphs, show that HoVerCut speeds up the process significantly without degrading the quality of partitioning. For example, HoVerCut partitions the aforementioned social network graph with 117 millions edges in 11 seconds that is about 37 times faster

Place, publisher, year, edition, pages
IEEE conference proceedings, 2016
Keyword
streaming graph, vertex-cut partitioning, graph partitioning, parallel scalability
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering Computer Systems
Identifiers
urn:nbn:se:kth:diva-189864 (URN)10.1109/BigDataCongress.2016.10 (DOI)000390212200001 ()2-s2.0-84994558691 (Scopus ID)978-1-5090-2622-7 (ISBN)
Conference
5th 2016 IEEE International Congress on Big Data (BigData Congress 2016)
Note

QC 20160923

Available from: 2016-07-20 Created: 2016-07-20 Last updated: 2017-02-23Bibliographically approved
4. SpanEdge: Towards Unifying Stream Processing over Central and Near-the-Edge Data Centers
Open this publication in new window or tab >>SpanEdge: Towards Unifying Stream Processing over Central and Near-the-Edge Data Centers
2016 (English)Conference paper, Published paper (Refereed)
Abstract [en]

In stream processing, data is streamed as a continuous flow of data items, which are generated from multiple sources and geographical locations. The common approach for stream processing is to transfer raw data streams to a central data center that entails communication over the wide-area network (WAN). However, this approach is inefficient and falls short for two main reasons: i) the burst in the amount of data generated at the network edge by an increasing number of connected devices, ii) the emergence of applications with predictable and low latency requirements. In this paper, we propose SpanEdge, a novel approach that unifies stream processing across a geodistributed infrastructure, including the central and near-theedge data centers. SpanEdge reduces or eliminates the latency incurred by WAN links by distributing stream processing applications across the central and the near-the-edge data centers. Furthermore, SpanEdge provides a programming environment, which allows programmers to specify parts of their applications that need to be close to the data source. Programmers can develop a stream processing application, regardless of the number of data sources and their geographical distributions. As a proof of concept, we implemented and evaluated a prototype of SpanEdge. Our results show that SpanEdge can optimally deploy the stream processing applications in a geo-distributed infrastructure, which significantly reduces the bandwidth consumption and the response latency.

Keyword
geo-distributed stream processing, geo-distributed infrastructure, edge computing, edge-based analytics
National Category
Computer and Information Science
Identifiers
urn:nbn:se:kth:diva-193581 (URN)10.1109/SEC.2016.17 (DOI)000391420900036 ()
Conference
The First IEEE/ACM Symposium on Edge Computing (SEC)
Note

QC 20161005

Available from: 2016-10-04 Created: 2016-10-04 Last updated: 2017-02-03Bibliographically approved

Open Access in DiVA

Hooman Peiro Sajjad Licentiate Thesis(752 kB)121 downloads
File information
File name FULLTEXT04.pdfFile size 752 kBChecksum SHA-512
3d6734cf31f5a71760275e878af93a7939674973b00265f8fbd2341f99890facadbd02f0418fbabbbca51ef2cf26bfdee7d9602ea1be83b4c8d836fb581101c3
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Peiro Sajjad, Hooman
By organisation
Software and Computer systems, SCS
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 146 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1580 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf