Change search
ReferencesLink to record
Permanent link

Direct link
Maintaining Strong Consistency Semantics in a Horizontally Scalable and Highly Available Implementation of HDFS
KTH, School of Information and Communication Technology (ICT).
KTH, School of Information and Communication Technology (ICT).
2013 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

The Hadoop Distributed Filesystem (HDFS) is the storage layer of Hadoop, scaling to support tens of petabytes of data at companies such as Facebook and Yahoo. One wellknown limitation of HDFS is that its metadata has been stored inmemory on a single node, called the NameNode. To overcome NameNode’s limitations, a distributed file system concept based on HDFS, called KTHFS, was proposed in which NameNode’s metadata are stored on an inmemory replicated distributed database, MySQL Cluster.

In this thesis, we show how to store the metadata of HDFS NameNode in an external distributed database while maintaining strong consistency semantics of HDFS for both filesystem operations and primitive HDFS operations. Our implementation supports MySQL Cluster, to store the metadata, although it only supports a readcommitted transaction isolation model. As a readcommitted isolation model cannot guarantee strong consistency, we needed to carefully design how metadata is read and written in MySQL Cluster to ensure our system preserves HDFS’s consistency model and is both deadlock free and highly performant. We developed a transaction model based on taking metadata snapshotting and the careful ordering of database operations. Our model is general enough to support any database providing at least readcommitted isolation level. We evaluate our model and show how HDFS can scale, while maintaining strong consistency, to terabytes of metadata.

Place, publisher, year, edition, pages
2013. , 64 p.
Trita-ICT-EX, 2013:79
National Category
Engineering and Technology
URN: urn:nbn:se:kth:diva-127464OAI: diva2:644417
Educational program
Master of Science - Distributed Computing
Available from: 2013-08-30 Created: 2013-08-30 Last updated: 2013-08-30Bibliographically approved

Open Access in DiVA

fulltext(1720 kB)259 downloads
File information
File name FULLTEXT01.pdfFile size 1720 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
School of Information and Communication Technology (ICT)
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 259 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 200 hits
ReferencesLink to record
Permanent link

Direct link