Distributed multi-query optimization of continuous clustering queries
2014 (English)In: Proc. VLDB 2014 PhD Workshop, 2014Conference paper (Refereed)
This work addresses the problem of sharing execution plans for queries that continuously cluster streaming data to provide an evolving summary of the data stream. This is challenging since clustering is an expensive task, there might be many clustering queries running simultaneously, each continuous query has a long life time span, and the execution plans often overlap. Clustering is similar to conventional grouped aggregation but cluster formation is more expensive than group formation, which makes incremental maintenance more challenging. The goal of this work is to minimize response time of continuous clustering queries with limited resources through multi-query optimization. To that end, strategies for sharing execution plans between continuous clustering queries are investigated and the architecture of a system is outlined that optimizes the processing of multiple such queries. Since there are many clustering algorithms, the system should be extensible to easily incorporate user defined clustering algorithms.
Place, publisher, year, edition, pages
Research subject Computer Science with specialization in Database Technology
IdentifiersURN: urn:nbn:se:uu:diva-302790OAI: oai:DiVA.org:uu-302790DiVA: diva2:967635