Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Heterogeneous Storage in HopsFS
KTH, School of Information and Communication Technology (ICT).
2016 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

In the recent years, the Apache Hadoop distributed file system (HDFS) has become increasingly popular for the storage of large data sets. Both the volume of the data and the variety of applications is unprecedented. The variety of tasks, each with its own access pattern and demands, calls for a file system that supports specialized storages for different tasks.

This thesis describes the implementation of heterogeneous storage in HopsFS, a highly-available, highly-scalable version of HDFS. This makes the cluster aware of different storage types (e.g. hard disks and solid state drives) and allows users to specify preferred storage types for their data.

By introducing new storage types, we build in support for storage technologies like SSDs and RAID. The latter is especially of interest, since it increases both bandwidth and reliability of the storage on individual nodes while continuing commodity hardware. Since network bandwidth is increasing orders of magnitude faster than disk bandwidth, increasing the disk throughput is of vital importance to avoid local storage becoming a bottleneck.

The heterogeneous storage Application Programming Interface (API) described in this thesis offers HDFS administrators more control over their data while being compatible with the HDFS framework. Users can choose whether they want files stored on traditional disks, SSDs or more complex constructions using RAID and erasure coding.

Place, publisher, year, edition, pages
2016. , p. 62
Series
TRITA-ICT-EX ; 2016:123
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-202970OAI: oai:DiVA.org:kth-202970DiVA, id: diva2:1080713
Subject / course
Computer Science
Educational program
Degree of Master
Supervisors
Examiners
Available from: 2017-03-10 Created: 2017-03-10 Last updated: 2018-01-13Bibliographically approved

Open Access in DiVA

fulltext(1579 kB)153 downloads
File information
File name FULLTEXT01.pdfFile size 1579 kBChecksum SHA-512
ac63010ce30e167654c5b9b89d2ea0c0374ceab225e5bd1c2077677a220a0442bdd0600d0cd69611d20caf57e3eca17eda04e563453f7dc2807a3b41c96b310c
Type fulltextMimetype application/pdf

By organisation
School of Information and Communication Technology (ICT)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 153 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 141 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf