Change search
ReferencesLink to record
Permanent link

Direct link
Quota based access-control for Hops: Improving cluster utilization with Hops-YARN
KTH, School of Information and Communication Technology (ICT).
2016 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

YARN is the resource management framework for Hadoop, and is, in many senses, the modern operating system for the data center. YARN clusters are running at organizations such as Yahoo!, Spotify, and Twitter with clusters of up to 3500 nodes being reported in the literature. To harness the power of so many nodes and manage them efficiently YARN is required to fulfill the requirements like scalability, serviceability, multitenancy, reliability, high cluster utilization, secure and auditable operation. Currently, YARN supports three different schedulers for prioritizing the allocation of resources (CPU, memory) to applications. Existing schedulers have a broken incentive model for popular frameworks like Apache Spark and Apache Flink where applications have gang-scheduling semantics, that is, they need all nodes to be available before they can start work. Users are incentivized to launch and hog their resources, as there may be a substantial delay (in Spotify, up to 1 hour) in getting 100 or more nodes allocated to your application. Users are not penalized for hogging resources. Capacity scheduler is one of the schedulers that has been used as a default scheduler in YARN which is quite good in sharing resources among tenants with a degree of guaranteed resource availability. Still there is room for improvements. In this thesis, we propose the design and implementation of a new system called Quota-based access control system that will work as a layer over capacity scheduler for Hops-YARN, a project developed on Apache YARN. Quota-based access control system involves allocating a quota of resources to projects.

A project consists of a number of users who manage a number of data sets and is taken from a new frontend for Hadoop called HopsWorks, (www.hops.io). Project members can spend part of their quota to launch and run applications. In contrast to existing schedulers, our control system will incentivize users for not launching unnecessary applications or hog resources. In this work we also have analyzed the operational model of the scheduler including Quota-based access control system with different application scheduling scenarios. We also have investigated the failure scenarios which includes network partition and failure of different components of YARN and analyzed the consequence of the failure on the scheduling operation. Finally, we have proposed some future improvements for this scheduling system.

Place, publisher, year, edition, pages
2016. , 86 p.
Series
TRITA-ICT-EX, 2016:109
National Category
Computer and Information Science
Identifiers
URN: urn:nbn:se:kth:diva-195647OAI: oai:DiVA.org:kth-195647DiVA: diva2:1044829
Subject / course
Computer Science
Educational program
Master of Science - Distributed Computing
Examiners
Available from: 2016-11-07 Created: 2016-11-07 Last updated: 2016-11-07Bibliographically approved

Open Access in DiVA

fulltext(2852 kB)11 downloads
File information
File name FULLTEXT01.pdfFile size 2852 kBChecksum SHA-512
2fdfb28425ef9e78830efdc2ea3dbabcfa5c14a8d416dbf05fb2623fc42f40d858fb3483043c38438d44d6f20c2d4d1418e65e0f6b3dd98267289e248dab2016
Type fulltextMimetype application/pdf

By organisation
School of Information and Communication Technology (ICT)
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 11 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 3 hits
ReferencesLink to record
Permanent link

Direct link