uu.seUppsala universitets publikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Distributed multi-query optimization of continuous clustering queries
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Datalogi. (UDBL)
2014 (engelsk)Inngår i: Proc. VLDB 2014 PhD Workshop, 2014Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

This work addresses the problem of sharing execution plans for queries that continuously cluster streaming data to provide an evolving summary of the data stream. This is challenging since clustering is an expensive task, there might be many clustering queries running simultaneously, each continuous query has a long life time span, and the execution plans often overlap. Clustering is similar to conventional grouped aggregation but cluster formation is more expensive than group formation, which makes incremental maintenance more challenging. The goal of this work is to minimize response time of continuous clustering queries with limited resources through multi-query optimization. To that end, strategies for sharing execution plans between continuous clustering queries are investigated and the architecture of a system is outlined that optimizes the processing of multiple such queries. Since there are many clustering algorithms, the system should be extensible to easily incorporate user defined clustering algorithms.

sted, utgiver, år, opplag, sider
2014.
HSV kategori
Forskningsprogram
Datavetenskap med inriktning mot databasteknik
Identifikatorer
URN: urn:nbn:se:uu:diva-302790OAI: oai:DiVA.org:uu-302790DiVA, id: diva2:967635
Konferanse
VLDB 2014
Tilgjengelig fra: 2016-09-09 Laget: 2016-09-09 Sist oppdatert: 2018-01-10bibliografisk kontrollert
Inngår i avhandling
1. Real-time data stream clustering over sliding windows
Åpne denne publikasjonen i ny fane eller vindu >>Real-time data stream clustering over sliding windows
2016 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

In many applications, e.g. urban traffic monitoring, stock trading, and industrial sensor data monitoring, clustering algorithms are applied on data streams in real-time to find current patterns. Here, sliding windows are commonly used as they capture concept drift.

Real-time clustering over sliding windows is early detection of continuously evolving clusters as soon as they occur in the stream, which requires efficient maintenance of cluster memberships that change as windows slide.

Data stream management systems (DSMSs) provide high-level query languages for searching and analyzing streaming data. In this thesis we extend a DSMS with a real-time data stream clustering framework called Generic 2-phase Continuous Summarization framework (G2CS).  G2CS modularizes data stream clustering by taking as input clustering algorithms which are expressed in terms of a number of functions and indexing structures. G2CS supports real-time clustering by efficient window sliding mechanism and algorithm transparent indexing. A particular challenge for real-time detection of a high number of rapidly evolving clusters is efficiency of window slides for clustering algorithms where deletion of expired data is not supported, e.g. BIRCH. To that end, G2CS includes a novel window maintenance mechanism called Sliding Binary Merge (SBM). To further improve real-time sliding performance, G2CS uses generation-based multi-dimensional indexing where indexing structures suitable for the clustering algorithms can be plugged-in.

sted, utgiver, år, opplag, sider
Uppsala: Acta Universitatis Upsaliensis, 2016. s. 33
Serie
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1431
Emneord
Data streaming; Sliding windows; Clustering;
HSV kategori
Forskningsprogram
Datavetenskap med inriktning mot databasteknik
Identifikatorer
urn:nbn:se:uu:diva-302799 (URN)978-91-554-9698-2 (ISBN)
Disputas
2016-11-23, ITC 2446, Lägerhyddsvägen 2, Uppsala, 10:00 (engelsk)
Opponent
Veileder
Tilgjengelig fra: 2016-11-02 Laget: 2016-09-09 Sist oppdatert: 2016-11-16

Open Access i DiVA

fulltext(429 kB)66 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 429 kBChecksum SHA-512
1426c0788f0c41f5529aa953caab66232e69bf5357bc387d13d8ead3a4e0e0b1a81ca8e7c0f99576fbcf3c0c70a67024856d0d3d3a479ce217596517bfdf660d
Type fulltextMimetype application/pdf

Søk i DiVA

Av forfatter/redaktør
Badiozamany, Sobhan
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 66 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

urn-nbn

Altmetric

urn-nbn
Totalt: 528 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf