Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
TupleSearch: A scalable framework based on sketches to process and store streaming temporal data for real time analytics
Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
2017 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

In many fields, there is a need for quick analysis of data. As the number of devices connected to the Internet grows, so does the amounts of data generated. The traditional way of analyzing large amounts of data has been by using batch processing, where the already collected data is pro-cessed. This process is time consuming, resulting in another trend emerg-ing: stream processing. Stream processing is when data is processed and stored as it arrives. Because of the velocity, volume and variations in data. Stream processing is best carried out in the main memory, and means processing and storing data as it arrives, which makes it a big challenge. This thesis focuses on developing a framework for the processing and storing of streaming temporal data enabling the data to be analyzed in real time. For this purpose, a server application was created consisting of approximate in-memory data synopsizes, called sketches, to process and store the input data. Furthermore, a client web application was created to query and analyze the data. The results show that the framework can sup-port simple aggregate queries with constant query time regardless to the volume of data. Also, it can process data 6.8 times faster than a traditional database system. All this implies that the system is scalable, at the same time it with a query error vs. memory trade-off. For a distribution of ~3000000 unique items it was concluded that the framework can provide very accurate answers, with an error rate less than 1.1%, for the trendiest data using about 100 times less space than the actual size of the data set.

Place, publisher, year, edition, pages
2017. , p. 70
Keywords [en]
Streaming Data, Stream Processing, Count-Min Sketch, Time Adaptive Sketches
National Category
Computer Engineering
Identifiers
URN: urn:nbn:se:miun:diva-31041Local ID: DT-V17-A2-003OAI: oai:DiVA.org:miun-31041DiVA, id: diva2:1116931
Subject / course
Computer Engineering DT1
Educational program
Master of Science in Engineering - Computer Engineering TDTEA 300 higher education credits
Supervisors
Examiners
Available from: 2017-06-28 Created: 2017-06-28 Last updated: 2018-01-13Bibliographically approved

Open Access in DiVA

fulltext(2832 kB)79 downloads
File information
File name FULLTEXT01.pdfFile size 2832 kBChecksum SHA-512
096b49ffa65fb4e8ca02fcea2f2c0dec1cb7d4e3cb11967071709f4a5a52e418c5c73ccbcfdb54b274339cc23d1a3ff994e56edc7773864ef7e0f7941d5fff9d
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Karlsson, Henrik
By organisation
Department of Information Systems and Technology
Computer Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 79 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 121 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf