Efficient and Cost-effective Workflow Based on Containers for Distributed Reproducible Experiments
Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Reproducing distributed experiments is a challenging task for many researchers. There are many factors which make this problem harder to solve. In order to reproduce distributed experiments, researchers need to perform complex deployments which involve many dependent software stacks with many configurations and manual orchestrations.
Further, researchers need to allocate a larger amount of money for clusters of machines and then spend their valuable time to perform those experiments. Also, some of the researchers spend a lot of time to validate a distributed scenario in a real environment as most of the pseudo distributed systems do not provide the characteristics of a real distributed system.
Karamel provides solutions for the inconvenience caused by the manual orchestration by providing a comprehensive orchestration platform to deploy and run distributed experiments. But still, this solution may incur a similar amount of expenses as of a manual distributed setup since it uses virtual machines underneath. Further, it does not provide quick validations of a distributed setup with a quick feedback loop, as it takes considerable time to terminate and provision new virtual machines.
Therefore, we provide a solution by integrating Docker that can co-exists with virtual machine based deployment model seamlessly. Our solution encapsulates the container-based deployment model for users to reproduce distributed experiment in a cost-effective and efficient manner.
In this project, we introduce novel deployment model with containers that is not possible with the conventional virtual machine based deployment model. Further, we evaluate our solution with a real deployment of Apache Hadoop Terasort experiment which is a benchmark for Apache Hadoop map-reduce platform in order to explain how this model can be used to save the cost and improve the efficiency.
Place, publisher, year, edition, pages
2016. , 67 p.
docker, orchestration, container, workflow, cloud, reproducible-experiments
IdentifiersURN: urn:nbn:se:kth:diva-194209OAI: oai:DiVA.org:kth-194209DiVA: diva2:1038806
Subject / course
Master of Science - Distributed Computing
Peiro Sajjad, Hooman, Ph.d student
Dowling, Jim, Associate Professor