uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Grand challenge: Implementation by frequently emitting parallel windows and user-defined aggregate functions
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science. (UDBL)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science. (UDBL)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science. (UDBL)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science. (UDBL)
Show others and affiliations
2013 (English)In: Proc. 7th ACM International Conference on Distributed Event-Based Systems, New York: ACM Press, 2013, 325-330 p.Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
New York: ACM Press, 2013. 325-330 p.
National Category
Computer Science
Identifiers
URN: urn:nbn:se:uu:diva-211954DOI: 10.1145/2488222.2488284ISBN: 978-1-4503-1758-0 (print)OAI: oai:DiVA.org:uu-211954DiVA: diva2:673540
Conference
DEBS 2013
Available from: 2013-06-29 Created: 2013-12-03 Last updated: 2016-09-09Bibliographically approved
In thesis
1. Main-Memory Query Processing Utilizing External Indexes
Open this publication in new window or tab >>Main-Memory Query Processing Utilizing External Indexes
2016 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Many applications require storage and indexing of new kinds of data in main-memory, e.g. color histograms, textures, shape features, gene sequences, sensor readings, or financial time series. Even though, many domain index structures were developed, very a few of them are implemented in any database management system (DBMS), usually only B-trees and hash indexes. A major reason is that the manual effort to include a new index implementation in a regular DBMS is very costly and time-consuming because it requires integration with all components of the DBMS kernel. To alleviate this, there are some extensible indexing frameworks. However, they all require re-engineering the index implementations, which is a problem when the index has third-party ownership, when only binary code is available, or simply when the index implementation is complex to re-engineer. Therefore, the DBMS should allow including new index implementations without code changes and performance degradation. Furthermore, for high performance the query processor needs knowledge of how to process queries to utilize plugged-in index. Moreover, it is important that all functionalities of a plugged-in index implementation are correct.

The extensible main memory database system (MMDB) Mexima (Main-memory External Index Manager) addresses these challenges. It enables transparent plugging in main-memory index implementations without code changes. Index specific rewrite rules transform complex queries to utilize the indexes. Automatic test procedures validate the correctness of them based on user provided index meta-data. Moreover, the same optimization framework can also optimize complex queries sent to a back-end DBMS by exposing hidden indexes for its query optimizer.

Altogether, Mexima is a complete and extensible platform for transparently index integration, utilization, and evaluation.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2016. 45 p.
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1352
Keyword
Database indexing, query processing, index structures, main-memory, index validation
National Category
Computer Science
Research subject
Computer Science with specialization in Database Technology
Identifiers
urn:nbn:se:uu:diva-280374 (URN)978-91-554-9509-1 (ISBN)
Public defence
2016-05-04, 2446, ITC, Lägerhyddsvägen 2, Uppsala, Uppsala, 13:15 (English)
Opponent
Supervisors
Available from: 2016-04-13 Created: 2016-03-09 Last updated: 2016-04-21
2. Scalable Validation of Data Streams
Open this publication in new window or tab >>Scalable Validation of Data Streams
2016 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

In manufacturing industries, sensors are often installed on industrial equipment generating high volumes of data in real-time. For shortening the machine downtime and reducing maintenance costs, it is critical to analyze efficiently this kind of streams in order to detect abnormal behavior of equipment.

For validating data streams to detect anomalies, a data stream management system called SVALI is developed. Based on requirements by the application domain, different stream window semantics are explored and an extensible set of window forming functions are implemented, where dynamic registration of window aggregations allow incremental evaluation of aggregate functions over windows.

To facilitate stream validation on a high level, the system provides two second order system validation functions, model-and-validate and learn-and-validate. Model-and-validate allows the user to define mathematical models based on physical properties of the monitored equipment, while learn-and-validate builds statistical models by sampling the stream in real-time as it flows.

To validate geographically distributed equipment with short response time, SVALI is a distributed system where many SVALI instances can be started and run in parallel on-board the equipment. Central analyses are made at a monitoring center where streams of detected anomalies are combined and analyzed on a cluster computer.

SVALI is an extensible system where functions can be implemented using external libraries written in C, Java, and Python without any modifications of the original code.

The system and the developed functionality have been applied on several applications, both industrial and for sports analytics.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2016. 51 p.
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1384
Keyword
Data Stream Management, Distributed Data Stream Processing, Data Stream Validation, Anomaly Detection
National Category
Computer Science
Research subject
Computer Science with specialization in Database Technology
Identifiers
urn:nbn:se:uu:diva-291530 (URN)978-91-554-9600-5 (ISBN)
Public defence
2016-08-17, room 2446, ITC building 2, Lägerhyddsvägen 2, Uppsala, 13:15 (English)
Opponent
Supervisors
Available from: 2016-05-25 Created: 2016-05-03 Last updated: 2016-06-15Bibliographically approved
3. Real-time data stream clustering over sliding windows
Open this publication in new window or tab >>Real-time data stream clustering over sliding windows
2016 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

In many applications, e.g. urban traffic monitoring, stock trading, and industrial sensor data monitoring, clustering algorithms are applied on data streams in real-time to find current patterns. Here, sliding windows are commonly used as they capture concept drift.

Real-time clustering over sliding windows is early detection of continuously evolving clusters as soon as they occur in the stream, which requires efficient maintenance of cluster memberships that change as windows slide.

Data stream management systems (DSMSs) provide high-level query languages for searching and analyzing streaming data. In this thesis we extend a DSMS with a real-time data stream clustering framework called Generic 2-phase Continuous Summarization framework (G2CS).  G2CS modularizes data stream clustering by taking as input clustering algorithms which are expressed in terms of a number of functions and indexing structures. G2CS supports real-time clustering by efficient window sliding mechanism and algorithm transparent indexing. A particular challenge for real-time detection of a high number of rapidly evolving clusters is efficiency of window slides for clustering algorithms where deletion of expired data is not supported, e.g. BIRCH. To that end, G2CS includes a novel window maintenance mechanism called Sliding Binary Merge (SBM). To further improve real-time sliding performance, G2CS uses generation-based multi-dimensional indexing where indexing structures suitable for the clustering algorithms can be plugged-in.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2016. 33 p.
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1431
Keyword
Data streaming; Sliding windows; Clustering;
National Category
Computer Systems
Research subject
Computer Science with specialization in Database Technology
Identifiers
urn:nbn:se:uu:diva-302799 (URN)978-91-554-9698-2 (ISBN)
Public defence
2016-11-23, ITC 2446, Lägerhyddsvägen 2, Uppsala, 10:00 (English)
Opponent
Supervisors
Available from: 2016-11-02 Created: 2016-09-09 Last updated: 2016-11-16

Open Access in DiVA

No full text

Other links

Publisher's full text

Authority records BETA

Badiozamany, SobhanMelander, LarsTruong, ThanhXu, ChengRisch, Tore

Search in DiVA

By author/editor
Badiozamany, SobhanMelander, LarsTruong, ThanhXu, ChengRisch, Tore
By organisation
Computing Science
Computer Science

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 478 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf