Massive scale-out of expensive continuous queries
2011 (English)In: 36th International Conference on Very Large Data Bases: VLDB 2010, 2011Conference paper (Refereed)
Scalable execution of expensive continuous queries over massive data streams requires input streams to be split into parallel sub-streams. The query operators are continuously executed in parallel over these sub-streams. Stream splitting involves both partitioning and replication of incoming tuples, depending on how the continuous query is parallelized. We provide a stream splitting operator that enables such customized stream splitting. However, it is critical that the stream splitting itself keeps up with input streams of high volume. This is a problem when the stream splitting predicates have some costs. Therefore, to enable customized splitting of high-volume streams, we introduce a parallelized stream splitting operator, called parasplit. We investigate the performance of parasplit using a cost model and experimentally. Based on these results, a heuristic is devised to automatically parallelize the execution of parasplit. We show that the maximum stream rate of parasplit approaches network speed, and that the parallelization is resource efficient. Finally, the scalability of our approach is experimentally demonstrated on the Linear Road Benchmark, showing an order of magnitude higher stream processing rate over previously published results, allowing at least 512 expressways.
Place, publisher, year, edition, pages
Computer Science Computer Science
Research subject Computer Science with specialization in Database Technology
IdentifiersURN: urn:nbn:se:uu:diva-152251OAI: oai:DiVA.org:uu-152251DiVA: diva2:413076