Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Decentralized detection of global threshold crossings using aggregation trees
KTH, School of Electrical Engineering (EES), Communication Networks. KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.ORCID iD: 0000-0001-5432-6442
KTH, School of Electrical Engineering (EES), Communication Networks. KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
2008 (English)In: Computer Networks, ISSN 1389-1286, E-ISSN 1872-7069, Vol. 52, no 9, 1745-1761 p.Article in journal (Refereed) Published
Abstract [en]

The timely detection that a monitored variable has crossed a given threshold is a fundamental requirement for many network management applications. A challenge is the detection of threshold crossing of network-wide variables, which are computed from device counters across the network, using aggregation functions such as SUM, MAX and AVERAGE. This paper contains a detailed description and a comprehensive evaluation of TCA-GAP, a protocol for detecting threshold crossings of network-wide aggregates in a distributed way. Elements of its design include tree-based incremental aggregation for estimating the value of aggregates, a local hysteresis mechanism to reduce overhead and dynamic recomputation of local thresholds to ensure correctness. The protocol is evaluated through extensive simulation using real traces in scenarios with network sizes up to 5232 nodes. From the measurements, we conclude that the protocol is efficient in the sense that the overhead is negligible when the aggregate is far from the threshold. It is scalable as the protocol overhead is independent of the system size for the network sizes and scenario configurations considered. We demonstrate that the local hysteresis parameter can be used to control the tradeoff between protocol overhead and detection delay. We further report on results on how node failures impact overhead and detection quality of the protocol.

Place, publisher, year, edition, pages
Elsevier, 2008. Vol. 52, no 9, 1745-1761 p.
Keyword [en]
decentralized network management, threshold crossing alerts, real-time, monitoring, tree-based aggregation protocols
National Category
Telecommunications
Identifiers
URN: urn:nbn:se:kth:diva-17634DOI: 10.1016/j.comnet.2008.02.015ISI: 000257012600006Scopus ID: 2-s2.0-43449096331OAI: oai:DiVA.org:kth-17634DiVA: diva2:335678
Note
NOTICE: this is the author’s version of a work that was accepted for publication in . Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in PUBLICATION, VOL 52, ISSUE 9, 2008, DOI 10.1016/j.comnet.2008.02.015 QC 20100525 QC 20120213Available from: 2012-02-13 Created: 2010-08-05 Last updated: 2017-12-12Bibliographically approved
In thesis
1. Real-Time Monitoring of Global Variables in Large-Scale Dynamic Systems
Open this publication in new window or tab >>Real-Time Monitoring of Global Variables in Large-Scale Dynamic Systems
2007 (English)Licentiate thesis, comprehensive summary (Other scientific)
Abstract [en]

Large-scale dynamic systems, such as the Internet, as well as emerging peer-to-peer networks and computational grids, require a high level of awareness of the system state in real-time for proper and reliable operation. A key challenge is to develop monitoring functions that are efficient, scalable, robust and controllable. The thesis addresses this challenge by focusing on engineering protocols for distributed monitoring of global state variables. The global variables are network-wide aggregates, computed from local device variables using aggregation functions such as SUM, MAX, AVERAGE, etc. Furthermore, it addresses the problem of detecting threshold crossing of such aggregates. The design goals for the protocols are efficiency, quality, scalability, robustness and controllability. The work presented in this thesis has resulted in two novel protocols: a gossip-based protocol for continuous monitoring of aggregates called G-GAP, and a tree-based protocol for detecting thresh old crossings of aggregates called TCA-GAP. The protocols have been evaluated against the design goals through three complementing evaluation methods: theoretical analysis, simulation study and testbed implementation.

Place, publisher, year, edition, pages
Stockholm: KTH, 2007. 107 p.
Series
Trita-EE, ISSN 1653-5146 ; 2007:065
National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-4646 (URN)978-91-7178-774-3 (ISBN)
Presentation
2007-12-04, Q22, KTH, Osquldas väg 6, Stockholm, 10:00
Opponent
Supervisors
Note
QC 20101122Available from: 2008-02-27 Created: 2008-02-27 Last updated: 2010-11-22Bibliographically approved
2. Distributed Monitoring and Resource Management for Large Cloud Environments
Open this publication in new window or tab >>Distributed Monitoring and Resource Management for Large Cloud Environments
2010 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Over the last decade, the number, size and complexity of large-scale networked systems has been growing fast, and this trend is expected to accelerate. The best known example of a large-scale networked system is probably the Internet, while large datacenters for cloud services are the most recent ones. In such environments, a key challenge is to develop scalable and adaptive technologies for management functions. This thesis addresses the challenge by engineering several protocols  for distributed monitoring and resource management that are suitable for large-scale networked systems. First, we present G-GAP, a gossip-based protocol we developed for continuous monitoring of aggregates that are computed from device variables. We prove the robustness of this protocol to node failures and validate, through simulations, that its estimation accuracy does not change with increasing size of the monitored system under certain conditions. Second, we present TCA-GAP, a tree-based protocol, and TG-GAP, a gossip-based protocol for the purpose of monitoring threshold crossings of aggregates. For both protocols, we prove correctness properties and show, again through simulations, that both protocols are efficient, by showing that their overhead is at least two orders of magnitude smaller than that of a na\"ive approach, for cases where the monitored aggregate is sufficiently far from the threshold. Third, we present a gossip-based protocol for resource management in cloud environments. The protocol allocates CPU and memory resources to sites that are hosted by the cloud. We prove that the resource allocation computed by the protocol converges exponentially fast to an optimal allocation, for cases where sufficient memory is available. Through simulations, we show that the quality of the resource allocation approaches that of an ideal system when the total memory demand decreases significantly below the memory capacity of the entire system. In addition, we validate that the quality of the allocation does not change with increasing the number of hosted sites and machines, for the case where both metrics are scaled proportionally. Finally, we compare two approaches (tree-based and gossip-based) to engineering protocols for distributed management, for the case of real-time monitoring. Results of our simulation studies indicate that, regardless of the system size and failure rates in the monitored system, gossip protocols incur a significantly larger overhead than tree-based protocols for achieving the same monitoring quality (e.g., estimation accuracy or detection delay).

Place, publisher, year, edition, pages
Stockholm: KTH, 2010. vi, 26 p.
Series
Trita-EE, ISSN 1653-5146 ; 2010:051
Keyword
decentralized management, engineering protocols, distributed monitoring, resource management
National Category
Telecommunications Computer Science
Identifiers
urn:nbn:se:kth:diva-26207 (URN)978-91-7415-794-9 (ISBN)
Public defence
2010-12-10, Q2, Osquldas väg 10, plan 2, KTH, Stockholm, 14:00 (English)
Opponent
Supervisors
Note
QC 20101124Available from: 2010-11-24 Created: 2010-11-21 Last updated: 2012-03-22Bibliographically approved

Open Access in DiVA

fulltext(819 kB)233 downloads
File information
File name FULLTEXT01.pdfFile size 819 kBChecksum SHA-512
c3f18bd31d75158ae4d64c45761d899b8fb97d65fd724d7295f3b6bf627d7fe606d7670f0c461879e8da8016458348938bc5b73406aa98032fe730a51dfeedbe
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopussciencedirect.com

Search in DiVA

By author/editor
Wuhib, FetahiDam, MadsStadler, Rolf
By organisation
Communication NetworksACCESS Linnaeus Centre
In the same journal
Computer Networks
Telecommunications

Search outside of DiVA

GoogleGoogle Scholar
Total: 233 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 114 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf