Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Processing data sources with big data frameworks
KTH, School of Technology and Health (STH), Medical Engineering, Computer and Electronic Engineering.
KTH, School of Technology and Health (STH), Medical Engineering, Computer and Electronic Engineering.
2016 (English)Independent thesis Basic level (professional degree), 10 credits / 15 HE creditsStudent thesisAlternative title
Behandla datakällor med big data-ramverk (Swedish)
Abstract [en]

Big data is a concept that is expanding rapidly. As more and more data is generatedand garnered, there is an increasing need for efficient solutions that can be utilized to process all this data in attempts to gain value from it. The purpose of this thesis is to find an efficient way to quickly process a large number of relatively small files. More specifically, the purpose is to test two frameworks that can be used for processing big data. The frameworks that are tested against each other are Apache NiFi and Apache Storm. A method is devised in order to, firstly, construct a data flow and secondly, construct a method for testing the performance and scalability of the frameworks running this data flow. The results reveal that Apache Storm is faster than Apache NiFi, at the sort of task that was tested. As the number of nodes included in the tests went up, the performance did not always do the same. This indicates that adding more nodes to a big data processing pipeline, does not always result in a better performing setup and that, sometimes, other measures must be made to heighten the performance.

Abstract [sv]

Big data är ett koncept som växer snabbt. När mer och mer data genereras och samlas in finns det ett ökande behov av effektiva lösningar som kan användas föratt behandla all denna data, i försök att utvinna värde från den. Syftet med detta examensarbete är att hitta ett effektivt sätt att snabbt behandla ett stort antal filer, av relativt liten storlek. Mer specifikt så är det för att testa två ramverk som kan användas vid big data-behandling. De två ramverken som testas mot varandra är Apache NiFi och Apache Storm. En metod beskrivs för att, för det första, konstruera ett dataflöde och, för det andra, konstruera en metod för att testa prestandan och skalbarheten av de ramverk som kör dataflödet. Resultaten avslöjar att Apache Storm är snabbare än NiFi, på den typen av test som gjordes. När antalet noder som var med i testerna ökades, så ökade inte alltid prestandan. Detta visar att en ökning av antalet noder, i en big data-behandlingskedja, inte alltid leder till bättre prestanda och att det ibland krävs andra åtgärder för att öka prestandan.

Place, publisher, year, edition, pages
2016. , 70 p.
Series
TRITA-STH, 2016:55
Keyword [en]
Big data, NiFi, Storm, HDFS, performance, real-time, HDP
Keyword [sv]
Big data, NiFi, Storm, HDFS, prestanda, realtid, HDP
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:kth:diva-188204OAI: oai:DiVA.org:kth-188204DiVA: diva2:934359
External cooperation
Granditude
Subject / course
Computer Technology, Program- and System Development
Educational program
Bachelor of Science in Engineering - Computer Engineering
Supervisors
Examiners
Available from: 2016-09-29 Created: 2016-06-08 Last updated: 2016-09-29Bibliographically approved

Open Access in DiVA

Processing data sources with big data frameworks(1201 kB)215 downloads
File information
File name FULLTEXT01.pdfFile size 1201 kBChecksum SHA-512
7e165d91b548ebe137bd9aede237aa724710b68821c2c49a350186477acd769f3e7a1836ca4f0e5ae7ffd7360df1628b8c6465d89fc439535b07ba9e1fe6764a
Type fulltextMimetype application/pdf

By organisation
Computer and Electronic Engineering
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 215 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 764 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf