Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Optimizing Hadoop Parameters Based on the Application Resource Consumption
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
2013 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

The interest in analyzing the growing amounts of data has encouraged the deployment of large scale parallel computing frameworks such as Hadoop. In other words, data analytic is the main reason behind the success of distributed systems; this is due tothe fact that data might not fit on a single disk, and that processing can be very time consuming which makes parallel input analysis very useful. Hadoop relies on the MapReduce programming paradigm to distribute work among the machines; so a good balance of load will eventually influence the execution time of those kinds of applications.

This paper introduces a technique to optimize some configuration parameters using the application's CPU utilization in order to tune Hadoop; the theories stated and proved in this paper rely on the fact that the CPUs should neither be over utilized nor under utilized; in other words, the conclusion will be a sort of an equation of the parameter to be optimized in terms of the cluster infrastructure.The future research concerning this topic is planned to focus on tuning other Hadoop parameters and to use more accurate tools to analyze the cluster performance; moreover, it is also interesting to research any possible ways to optimize Hadoop parameters based on other consumption criteria such the input/output statistics and the network traffic.

Place, publisher, year, edition, pages
2013.
Series
IT, 13 034
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:uu:diva-200144OAI: oai:DiVA.org:uu-200144DiVA: diva2:622285
Educational program
Master Programme in Computer Science
Uppsok
Technology
Supervisors
Examiners
Available from: 2013-05-21 Created: 2013-05-21 Last updated: 2013-12-03Bibliographically approved

Open Access in DiVA

fulltext(1319 kB)