Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Analysis and comparison of interfacing, data generation and workload implementation in BigDataBench 4.0 and Intel HiBench 7.0
KTH, School of Electrical Engineering and Computer Science (EECS).
KTH, School of Electrical Engineering and Computer Science (EECS).
2018 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

One of the major challenges in Big Data is the accurate and meaningful assessment of system performance. Unlike other systems, minor differences in efficiency can escalate to large differences in costs and power consumption. While there are several tools on the marketplace for measuring the performance of Big Data systems, few of them have been explored in-depth.

This report investigated the interfacing, data generation and workload implementations of two Big Data benchmarking suites, BigDataBench and Hibench. The purpose of the study was to establish the capabilities of each tool with regards to interfacing, data generation and workload implementation.

An exploratory and qualitative approach was used to gather information and analyze each benchmarking tool. Source code, documentation, and reports published by the developers were used as information sources.

The results showed that BigDataBench and HiBench were designed similarly with regards to interfacing and data flow during the execution of a workload with the exception of streaming workloads. BigDataBench provided for more realistic data generation while the data generation for HiBench was easier to control. With regards to workload design, the workloads in BigDataBench were designed to be applicable to multiple frameworks while the workloads in HiBench were focused on the Hadoop family. In conclusion, neither of benchmarking suites was superior to the other. They were both designed for different purposes and should be applied on a case-by-case basis.

Abstract [sv]

En av de stora utmaningarna i Big Data är den exakta och meningsfulla bedömningen av systemprestanda. Till skillnad från andra system kan mindre skillnader i effektivitet eskalera till stora skillnader i kostnader och strömförbrukning. Medan det finns flera verktyg på marknaden för att mäta prestanda för Big Data-system, har få av dem undersökts djupgående.

I denna rapport undersöktes gränssnittet, datagenereringen och arbetsbelastningen av två Big Data benchmarking-sviter, BigDataBench och HiBench. Syftet med studien var att fastställa varje verktygs kapacitet med hänsyn till de givna kriterierna.

Ett utforskande och kvalitativt tillvägagångssätt användes för att samla information och analysera varje benchmarking verktyg. Källkod, dokumentation och rapporter som hade skrivits och publicerats av utvecklarna användes som informationskällor.

Resultaten visade att BigDataBench och HiBench utformades på samma sätt med avseende på gränssnitt och dataflöde under utförandet av en arbetsbelastning med undantag för strömmande arbetsbelastningar. BigDataBench tillhandahöll mer realistisk datagenerering medan datagenerering för HiBench var lättare att styra. När det gäller arbetsbelastningsdesign var arbetsbelastningen i BigDataBench utformad för att kunna tillämpas på flera ramar, medan arbetsbelastningen i HiBench var inriktad på Hadoop-familjen. Sammanfattningsvis var ingen av benchmarkingssuperna överlägsen den andra. De var båda utformade för olika ändamål och bör tillämpas från fall till fall.

Place, publisher, year, edition, pages
2018. , p. 61
Series
TRITA-EECS-EX ; 2018:370
Keywords [en]
Big Data, Benchmarking, BigDataBench, HiBench, Analy- sis, Comparison, Interfacing, Data Generation
Keywords [sv]
Big Data, Benchmarking, BigDataBench, HiBench, Analys, Jämförelse, Gränssnitt, Datagenerering
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-254332OAI: oai:DiVA.org:kth-254332DiVA, id: diva2:1330801
Subject / course
Information and Communication Technology
Educational program
Master of Science in Engineering - Information and Communication Technology
Supervisors
Examiners
Available from: 2019-06-26 Created: 2019-06-26 Last updated: 2019-06-26Bibliographically approved

Open Access in DiVA

fulltext(572 kB)23 downloads
File information
File name FULLTEXT01.pdfFile size 572 kBChecksum SHA-512
d1b9ede503e27ee54467c431c029df824566472922aadaa04953ef8da6aa33659669c8923991218b3dfb320e946b17d7e2a8b05cb1bb2bac4a274f981e5fe02d
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 23 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 55 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf