Change search
ReferencesLink to record
Permanent link

Direct link
Data Analysis on Hadoop - finding tools and applications for Big Data challenges
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
2015 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

With the increasing number of data generated each day, recent development in software, provide the tools needed to tackle the challenges of the so called Big Data era. This project introduces some of these platforms, in particular it focuses on platforms for data analysis and query tools that works alongside Hadoop. In the first part of this project, the Hadoop framework and its main components, MapReduce, YARN and HDFS are introduced. This is followed by giving an overview of seven platforms that are part of the Hadoop ecosystem. In this overview we exposed their key features, components, programming model and architecture. The following chapter introduced 12 parameters that are used to compare these platforms side by side and it ends with a summary and discussion where they are divided into several classes according to their usage, use cases and data environment. In the last part of this project, a web log analysis, belonging to one of Sweden's top newspapers, was done using Apache Spark, one of the platforms analyzed. The purpose of this analysis was to showcase some of the features of Spark while doing an exploratory data analysis.

Place, publisher, year, edition, pages
2015. , 79 p.
IT, 15053
National Category
Engineering and Technology
URN: urn:nbn:se:uu:diva-260557OAI: diva2:847616
Educational program
Master Programme in Computer Science
Available from: 2015-08-20 Created: 2015-08-20 Last updated: 2015-08-20Bibliographically approved

Open Access in DiVA

fulltext(1347 kB)509 downloads
File information
File name FULLTEXT01.pdfFile size 1347 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
Department of Information Technology
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 509 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 1415 hits
ReferencesLink to record
Permanent link

Direct link