Change search
ReferencesLink to record
Permanent link

Direct link
An approach to choosing the right distributed file system: Microsoft DFS vs. Hadoop DFS
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.
2015 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Context. An important goal of most IT groups is to manage server resources in such a way that their users are provided with fast, reliable and secure access to files. The modern needs of organizations imply that resources are often distributed geographically, asking for new design solutions for the file systems to remain highly available and efficient. This is where distributed file systems (DFSs) come into the picture. A distributed file system (DFS), as opposed to a "classical", local, file system, is accessible across some kind of network and allows clients to access files remotely as if they were stored locally.

Objectives. This paper has the goal of comparatively analyzing two distributed file systems, Microsoft DFS (MSDFS) and Hadoop DFS (HDFS). The two systems come from different "worlds" (proprietary - Microsoft DFS - vs. open-source - Hadoop DFS); the abundance of solutions and the variety of choices that exist today make such a comparison more relevant. Methods. The comparative analysis is done on a cluster of 4 computers running dual-installations of Microsoft Windows Server 2012 R2 (the MSDFS environment) and Linux Ubuntu 14.04 (the HDFS environment). The comparison is done on read and write operations on files and sets of files of increasing sizes, as well as on a set of key usage scenarios.

Results. Comparative results are produced for reading and writing operations of files of increasing size - 1 MB, 2 MB, 4 MB and so on up to 4096 MB - and of sets of small files (64 KB each) amounting to totals of 128 MB, 256 MB and so on up to 4096 MB. The results expose the behavior of the two DFSs on different types of stressful activities (when the size of the transferred file increases, as well as when the quantity of data is divided into (tens of) thousands of many small files). The behavior in the case of key usage scenarios is observed and analyzed.

Conclusions. HDFS performs better at writing large files, while MSDFS is better at writing many small files. At read operations, the two show similar performance, with a slight advantage for MSDFS. In the key usage scenarios, HDFS shows more flexibility, but MSDFS could be the better choice depending on the needs of the users (for example, most of the common functions can be configured through the graphical user interface).

Place, publisher, year, edition, pages
2015. , 74 p.
Keyword [en]
DFS, MSDFS, HDFS, Microsoft, Hadoop
National Category
Computer Systems
URN: urn:nbn:se:bth-844OAI: diva2:819604
Subject / course
PA1418 Bachelor's Thesis - Large Team Software Engineering Project
Educational program
PAGPT Software Engineering
2015-06-02, J1270, Campus Gräsvik, Karlskrona, Sweden, 09:15 (Swedish)
Available from: 2015-06-29 Created: 2015-06-10 Last updated: 2015-06-29Bibliographically approved

Open Access in DiVA

fulltext(3115 kB)123 downloads
File information
File name FULLTEXT02.pdfFile size 3115 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
Department of Software Engineering
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 123 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 302 hits
ReferencesLink to record
Permanent link

Direct link