Data Analysis on Hadoop - finding tools and applications for Big Data challenges
Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
With the increasing number of data generated each day, recent development in software, provide the tools needed to tackle the challenges of the so called Big Data era. This project introduces some of these platforms, in particular it focuses on platforms for data analysis and query tools that works alongside Hadoop. In the first part of this project, the Hadoop framework and its main components, MapReduce, YARN and HDFS are introduced. This is followed by giving an overview of seven platforms that are part of the Hadoop ecosystem. In this overview we exposed their key features, components, programming model and architecture. The following chapter introduced 12 parameters that are used to compare these platforms side by side and it ends with a summary and discussion where they are divided into several classes according to their usage, use cases and data environment. In the last part of this project, a web log analysis, belonging to one of Sweden's top newspapers, was done using Apache Spark, one of the platforms analyzed. The purpose of this analysis was to showcase some of the features of Spark while doing an exploratory data analysis.
Place, publisher, year, edition, pages
2015. , 79 p.
Engineering and Technology
IdentifiersURN: urn:nbn:se:uu:diva-260557OAI: oai:DiVA.org:uu-260557DiVA: diva2:847616
Master Programme in Computer Science
Hellander, AndreasNgai, Edith