Change search
ReferencesLink to record
Permanent link

Direct link
Efficient and Cost-effective Workflow Based on Containers for Distributed Reproducible Experiments
KTH, School of Information and Communication Technology (ICT).
2016 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Reproducing distributed experiments is a challenging task for many researchers. There are many factors which make this problem harder to solve. In order to reproduce distributed experiments, researchers need to perform complex deployments which involve many dependent software stacks with many configurations and manual orchestrations.

Further, researchers need to allocate a larger amount of money for clusters of machines and then spend their valuable time to perform those experiments. Also, some of the researchers spend a lot of time to validate a distributed scenario in a real environment as most of the pseudo distributed systems do not provide the characteristics of a real distributed system.

Karamel provides solutions for the inconvenience caused by the manual orchestration by providing a comprehensive orchestration platform to deploy and run distributed experiments. But still, this solution may incur a similar amount of expenses as of a manual distributed setup since it uses virtual machines underneath. Further, it does not provide quick validations of a distributed setup with a quick feedback loop, as it takes considerable time to terminate and provision new virtual machines.

Therefore, we provide a solution by integrating Docker that can co-exists with virtual machine based deployment model seamlessly. Our solution encapsulates the container-based deployment model for users to reproduce distributed experiment in a cost-effective and efficient manner.

In this project, we introduce novel deployment model with containers that is not possible with the conventional virtual machine based deployment model. Further, we evaluate our solution with a real deployment of Apache Hadoop Terasort experiment which is a benchmark for Apache Hadoop map-reduce platform in order to explain how this model can be used to save the cost and improve the efficiency. 

Place, publisher, year, edition, pages
2016. , 67 p.
Series
TRITA-ICT-EX, 2016:125
Keyword [en]
docker, orchestration, container, workflow, cloud, reproducible-experiments
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:kth:diva-194209OAI: oai:DiVA.org:kth-194209DiVA: diva2:1038806
Subject / course
Computer Science
Educational program
Master of Science - Distributed Computing
Supervisors
Examiners
Available from: 2016-11-07 Created: 2016-10-19 Last updated: 2016-11-07Bibliographically approved

Open Access in DiVA

fulltext(9900 kB)43 downloads
File information
File name FULLTEXT01.pdfFile size 9900 kBChecksum SHA-512
6fbb6baa18899addbf1243e4f3244dd73517af349813e6110bab3948efeeb300073e7fe4d1fb6c38dbd5a57e90f1635ca01d264cc5bd1392dea9bd144251c564
Type fulltextMimetype application/pdf

By organisation
School of Information and Communication Technology (ICT)
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 43 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 68 hits
ReferencesLink to record
Permanent link

Direct link