Change search
ReferencesLink to record
Permanent link

Direct link
Allocating Compute and Network Resources under Management Objectives in Large-Scale Clouds
KTH, School of Electrical Engineering (EES), Communication Networks. KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.ORCID iD: 0000-0002-2680-9065
KTH, School of Electrical Engineering (EES), Communication Networks. KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
2013 (English)In: Journal of Network and Systems Management, ISSN 1064-7570, E-ISSN 1573-7705Article in journal (Refereed) Published
Abstract [en]

We consider the problem of jointly allocating compute and network resources in a large Infrastructure-as-a-Service (IaaS) cloud. We formulate the problem of optimally allocating resources to virtual data centers (VDCs) for four well-known management objectives: balanced load, energy efficiency, fair allocation, and service differentiation.  Then, we outline an architecture for resource allocation, which centers around a set of cooperating controllers, each solving a problem related to the chosen management objective. We illustrate how a global management objective is mapped onto objectives that govern the execution of these controllers. For a key controller, the Dynamic Placement Controller, we give a detailed distributed design, which is based on a gossip protocol that can switch between management objectives. The design is applicable to a broad class of management objectives, which we characterize through a property of the objective function.  The property ensures the applicability of an iterative descent method that the gossip protocol implements.  We evaluate, through simulation, the dynamic placement of VDCs for a large cloud under changing load and VDC churn. Simulation results show that this controller is effective and highly scalable, up to 100'000 nodes, for the management objectives considered.

Place, publisher, year, edition, pages
Keyword [en]
Cloud computing, distributed management, resource allocation, gossip protocols, management objectives
National Category
Computer Systems Communication Systems
URN: urn:nbn:se:kth:diva-126965DOI: 10.1007/s10922-013-9280-6ISI: 000350554700005ScopusID: 2-s2.0-84921068169OAI: diva2:642984

QC 20130828

Available from: 2013-08-23 Created: 2013-08-23 Last updated: 2016-04-11Bibliographically approved
In thesis
1. Data-driven Performance Prediction and Resource Allocation for Cloud Services
Open this publication in new window or tab >>Data-driven Performance Prediction and Resource Allocation for Cloud Services
2016 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Cloud services, which provide online entertainment, enterprise resource management, tax filing, etc., are becoming essential for consumers, businesses, and governments. The key functionalities of such services are provided by backend systems in data centers. This thesis focuses on three fundamental problems related to management of backend systems. We address these problems using data-driven approaches: triggering dynamic allocation by changes in the environment, obtaining configuration parameters from measurements, and learning from observations. 

The first problem relates to resource allocation for large clouds with potentially hundreds of thousands of machines and services. We developed and evaluated a generic gossip protocol for distributed resource allocation. Extensive simulation studies suggest that the quality of the allocation is independent of the system size for the management objectives considered.

The second problem focuses on performance modeling of a distributed key-value store, and we study specifically the Spotify backend for streaming music. We developed analytical models for system capacity under different data allocation policies and for response time distribution. We evaluated the models by comparing model predictions with measurements from our lab testbed and from the Spotify operational environment. We found the prediction error to be below 12% for all investigated scenarios.

The third problem relates to real-time prediction of service metrics, which we address through statistical learning. Service metrics are learned from observing device and network statistics. We performed experiments on a server cluster running video streaming and key-value store services. We showed that feature set reduction significantly improves the prediction accuracy, while simultaneously reducing model computation time. Finally, we designed and implemented a real-time analytics engine, which produces model predictions through online learning.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2016. 53 p.
TRITA-EE, ISSN 1653-5146 ; 2016:020
National Category
Communication Systems Computer Systems Telecommunications Computer Engineering Other Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Electrical Engineering
urn:nbn:se:kth:diva-184601 (URN)978-91-7595-876-7 (ISBN)
Public defence
2016-05-03, F3, Lindstedtsvägen 26, KTH Campus, Stockholm, 14:00 (English)
VINNOVA, 2013-03895

QC 20160411

Available from: 2016-04-11 Created: 2016-04-01 Last updated: 2016-05-30Bibliographically approved

Open Access in DiVA

fulltext(2348 kB)348 downloads
File information
File name FULLTEXT01.pdfFile size 2348 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopusPublisher's version

Search in DiVA

By author/editor
Yanggratoke, RerngvitStadler, Rolf
By organisation
Communication NetworksACCESS Linnaeus Centre
In the same journal
Journal of Network and Systems Management
Computer SystemsCommunication Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 348 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 94 hits
ReferencesLink to record
Permanent link

Direct link