Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Scalable Splitting of Massive Data Streams
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science. (UDBL)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science. (UDBL)
2010 (English)In: Database Systems for Advanced Applications: Part II / [ed] Kitagawa H., Ishikawa Y., Li Q., Watanabe C., Berlin: Springer-Verlag , 2010, p. 184-198Conference paper, Published paper (Refereed)
Abstract [en]

Scalable execution of continuous queries over massive data streams often requires splitting input streams into parallel sub-streams over which query operators are executed in parallel. Automatic stream splitting is in general very difficult, as the optimal parallelization may depend on application semantics. To enable application specific stream splitting, we introduce splitstream functions where the user specifies non-procedural stream partitioning and replication. For high-volume streams, the stream splitting itself becomes a performance bottleneck. A cost model is introduced that estimates the performance of splitstream functions with respect to throughput and CPU usage. We implement parallel splitstream functions, and relate experimental results to cost model estimates. Based on the results, a splitstream function called autosplit is proposed, which scales well for high degrees of parallelism, and is robust for varying proportions of stream partitioning and replication. We show how user defined parallelization using autosplit provides substantially improved scalability (L = 64) over previously published results for the Linear Road Benchmark.

Place, publisher, year, edition, pages
Berlin: Springer-Verlag , 2010. p. 184-198
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 5982
Keyword [en]
distributed stream systems, parallelization, query optimization
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:uu:diva-136403DOI: 10.1007/978-3-642-12098-5_15ISI: 000278934800015ISBN: 978-3-642-12097-8 (print)OAI: oai:DiVA.org:uu-136403DiVA, id: diva2:376826
Conference
15th International Conference, DASFAA 2010, Tsukuba, Japan, April 1-4, 2010
Projects
iStreamseSSENCE
Available from: 2010-12-14 Created: 2010-12-13 Last updated: 2018-01-12Bibliographically approved
In thesis
1. Scalable Parallelization of Expensive Continuous Queries over Massive Data Streams
Open this publication in new window or tab >>Scalable Parallelization of Expensive Continuous Queries over Massive Data Streams
2011 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Numerous applications in for example science, engineering, and financial analysis increasingly require online analysis over streaming data. These data streams are often of such a high rate that saving them to disk is not desirable or feasible. Therefore, search and analysis must be performed directly over the data in motion. Such on-line search and analysis can be expressed as continuous queries (CQs) that are defined over the streams. The result of a CQ is a stream itself, which is continuously updated as new data appears in the queried stream(s). In many cases, the applications require non-trivial analysis, leading to CQs involving expensive processing. To provide scalability of such expensive CQs over high-volume streams, the execution of the CQs must be parallelized.

In order to investigate different approaches to parallel execution of CQs, a parallel data stream management system called SCSQ was implemented for this Thesis. Data and queries from space physics and traffic management applications are used in the evaluations, as well as synthetic data and the standard data stream benchmark; the Linear Road Benchmark. Declarative parallelization functions are introduced into the query language of SCSQ, allowing the user to specify customized parallelization. In particular, declarative stream splitting functions are introduced, which split a stream into parallel sub-streams, over which expensive CQ operators are continuously executed in parallel.

Naïvely implemented, stream splitting becomes a bottleneck if the input streams are of high volume, if the CQ operators are massively parallelized, or if the stream splitting conditions are expensive. To eliminate this bottleneck, different approaches are investigated to automatically generate parallel execution plans for stream splitting functions. This Thesis shows that by parallelizing the stream splitting itself, expensive CQs can be processed at stream rates close to network speed. Furthermore, it is demonstrated how parallelized stream splitting allows orders of magnitude higher stream rates than any previously published results for the Linear Road Benchmark.

 

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2011. p. 35
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 836
National Category
Computer Sciences
Research subject
Computer Science with specialization in Database Technology
Identifiers
urn:nbn:se:uu:diva-152255 (URN)978-91-554-8095-0 (ISBN)
Public defence
2011-09-20, Auditorium Minus, Museum Gustavianum, Akademigatan 3, Uppsala, 13:15 (English)
Opponent
Supervisors
Available from: 2011-06-10 Created: 2011-04-27 Last updated: 2018-01-12

Open Access in DiVA

fulltext(297 kB)446 downloads
File information
File name FULLTEXT02.pdfFile size 297 kBChecksum SHA-512
49e081e56f8e93ec70434feb24622c055b32809633c6eae2334cc5ae96685d780f06e3940ec5f903ff7c581ddebbd46faf2b1db9bad6f8b9482106eaae32204c
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Zeitler, ErikRisch, Tore
By organisation
Computing Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 446 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 555 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
v. 2.34-SNAPSHOT
|