Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Efficient Representation Learning Using RandomWalks for Dynamic Graphs
KTH, Skolan för elektroteknik och datavetenskap (EECS), Programvaruteknik och datorsystem, SCS.
Data61, CSIRO.
Data61, CSIRO.
(engelsk)Manuskript (preprint) (Annet vitenskapelig)
Abstract [en]

An important part of many machine learning workflows on graphs is vertex representation learning, i.e., learning a low-dimensional vector representation for each vertex in the graph. Recently, several powerful techniques for unsupervised representation learning have been demonstrated to give the state-of-the-art performance in downstream tasks such as vertex classification and edge prediction. These techniques rely on random walks performed on the graph in order to capture its structural properties. These structural properties are then encoded in the vector representation space. 

However, most contemporary representation learning methods only apply to static graphs while real-world graphs are often dynamic and change over time. Static representation learning methods are not able to update the vector representations when the graph changes; therefore, they must re-generate the vector representations on an updated static snapshot of the graph regardless of the extent of the change in the graph. In this work, we propose computationally efficient algorithms for vertex representation learning that extend random walk based methods to dynamic graphs. The computation complexity of our algorithms depends upon the extent and rate of changes (the number of edges changed per update) and on the density of the graph. We empirically evaluate our algorithms on real world datasets for downstream machine learning tasks of multi-class and multi-label vertex classification. The results show that our algorithms can achieve competitive results to the state-of-the-art methods while being computationally efficient.

Emneord [en]
dynamic graph embedding, random walk, representation learning, information network, graph stream
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
URN: urn:nbn:se:kth:diva-243880OAI: oai:DiVA.org:kth-243880DiVA, id: diva2:1287128
Merknad

QC 20190208

Tilgjengelig fra: 2019-02-08 Laget: 2019-02-08 Sist oppdatert: 2019-03-04bibliografisk kontrollert
Inngår i avhandling
1. Methods and Algorithms for Data-Intensive Computing: Streams, Graphs, and Geo-Distribution
Åpne denne publikasjonen i ny fane eller vindu >>Methods and Algorithms for Data-Intensive Computing: Streams, Graphs, and Geo-Distribution
2019 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

Struggling with the volume and velocity of Big Data has attracted lots of interest towards stream processing paradigm, a paradigm in the area of data-intensive computing that provides methods and solutions to process data in motion. Today's Big Data includes geo-distributed data sources.In addition, a major part of today's Big Data requires exploring complex and evolving relationships among data, which complicates any reasoning on the data. This thesis aims at challenges raised by geo-distributed streaming data, and the data with complex and evolving relationships.

Many organizations provide global scale applications and services that are hosted on servers and data centers that are located in different parts of the world. Therefore, the data that needs to be processed are generated in different geographical locations. This thesis advocates for distributed stream processing in geo-distributed settings to improve the performance including better response time and lower network cost compared to centralized solutions. In this thesis, we conduct an experimental study of Apache Storm, a widely used open-source stream processing system, on a geo-distributed infrastructure made of near-the-edge resources. The resources that host the system's components are connected by heterogeneous network links. Our study exposes a set of issues and bottlenecks of deploying a stream processing system on the geo-distributed infrastructure. Inspired by the results, we propose a novel method for grouping of geo-distributed resources into computing clusters, called micro data centers, in order to mitigate the effect of network heterogeneity for distributed stream processing applications. Next, we focus on the windowed aggregation of geo-distributed data streams, which has been widely used in stream analytics. We propose to reduce the bandwidth cost by coordinating windowed aggregations among near-the-edge data centers. We leverage intra-region links and design a novel low-overhead coordination algorithm that optimizes communication cost for data aggregation. Then, we propose a system, called SpanEdge, that provides an expressive programming model to unify programming stream processing applications on a geo-distributed infrastructure and provides a run-time system to manage (schedule and execute) stream processing applications across data centers. Our results show that SpanEdge can optimally deploy stream processing applications in a geo-distributed infrastructure, which significantly reduces the bandwidth consumption and response latency.

With respect to data with complex and evolving relationships, this thesis aims at effective and efficient processing of inter-connected data. There exist several domains such as social network analysis, machine learning, and web search in which data streams are modeled as linked entities of nodes and edges, namely a graph. Because of the inter-connection among the entities in graph data, processing of graph data is challenging. The inter-connection among the graph entities makes it difficult to distribute the graph among multiple machines to process the graph at scale. Furthermore, in a streaming setting, the graph structure and the graph elements can continuously change as the graph elements are streamed. Such a dynamic graph requires incremental computing methods that can avoid redundant computations on the whole graph. This thesis proposes incremental computing methods of streaming graph processing that can boost the processing time while still obtaining high quality results. In this thesis, we introduce HoVerCut, an efficient framework for boosting streaming graph partitioning algorithms. HoVerCut is Horizontally and Vertically scalable. Our evaluations show that HoVerCut speeds up the partitioning process significantly without degrading the quality of partitioning. Finally, we study unsupervised representation learning in dynamic graphs. Graph representation learning seeks to learn low dimensional vector representations for the graph elements, i.e. edges and vertices, and the whole graph.We propose novel and computationally efficient incremental algorithms. The computation complexity of our algorithms depends on the extent and rate of changes in a graph and on the graph density. The evaluation results show that our proposed algorithms can achieve competitive results to the state-of-the-art static methods while being computationally efficient.

sted, utgiver, år, opplag, sider
Stockholm: KTH Royal Institute of Technology, 2019. s. 58
Serie
TRITA-EECS-AVL ; 2019:13
Emneord
stream processing, geo-distributed infrastructure, edge computing, streaming graph, dynamic graph
HSV kategori
Forskningsprogram
Informations- och kommunikationsteknik; Datalogi
Identifikatorer
urn:nbn:se:kth:diva-243883 (URN)978-91-7873-094-0 (ISBN)
Disputas
2019-03-15, Ka-Sal C (Sal Sven-Olof Öhrvik), Electrum, Kista, Stockholm, 13:30 (engelsk)
Opponent
Veileder
Merknad

QC 20190208

Tilgjengelig fra: 2019-02-08 Laget: 2019-02-08 Sist oppdatert: 2019-02-08bibliografisk kontrollert

Open Access i DiVA

fulltext(1298 kB)17 nedlastinger
Filinformasjon
Fil FULLTEXT03.pdfFilstørrelse 1298 kBChecksum SHA-512
d0b9313c5d3e97077c9542be4d607802b9db252b722aaf1566e8283c1040e9f3fc03e3331dbdc274e0e8f8db9c4b80f2061a26605065f379fe4b791b09ef249f
Type fulltextMimetype application/pdf

Søk i DiVA

Av forfatter/redaktør
Peiro Sajjad, Hooman
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 27 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

urn-nbn

Altmetric

urn-nbn
Totalt: 132 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf