Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
NoSQL approach to large scale analysis of persisted streams
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science. (UDBL)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science. (UDBL)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science. (UDBL)
2015 (English)In: Data Science, Springer, 2015, p. 152-156Conference paper, Published paper (Refereed)
Resource type
Text
Place, publisher, year, edition, pages
Springer, 2015. p. 152-156
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 9147
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:uu:diva-274783DOI: 10.1007/978-3-319-20424-6_15ISI: 000364104600015ISBN: 978-3-319-20423-9 (print)OAI: oai:DiVA.org:uu-274783DiVA, id: diva2:897635
Conference
BICOD 2015, July 6–8, Edinburgh, UK
Available from: 2015-06-11 Created: 2016-01-26 Last updated: 2021-11-22Bibliographically approved
In thesis
1. Main-Memory Query Processing Utilizing External Indexes
Open this publication in new window or tab >>Main-Memory Query Processing Utilizing External Indexes
2016 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Many applications require storage and indexing of new kinds of data in main-memory, e.g. color histograms, textures, shape features, gene sequences, sensor readings, or financial time series. Even though, many domain index structures were developed, very a few of them are implemented in any database management system (DBMS), usually only B-trees and hash indexes. A major reason is that the manual effort to include a new index implementation in a regular DBMS is very costly and time-consuming because it requires integration with all components of the DBMS kernel. To alleviate this, there are some extensible indexing frameworks. However, they all require re-engineering the index implementations, which is a problem when the index has third-party ownership, when only binary code is available, or simply when the index implementation is complex to re-engineer. Therefore, the DBMS should allow including new index implementations without code changes and performance degradation. Furthermore, for high performance the query processor needs knowledge of how to process queries to utilize plugged-in index. Moreover, it is important that all functionalities of a plugged-in index implementation are correct.

The extensible main memory database system (MMDB) Mexima (Main-memory External Index Manager) addresses these challenges. It enables transparent plugging in main-memory index implementations without code changes. Index specific rewrite rules transform complex queries to utilize the indexes. Automatic test procedures validate the correctness of them based on user provided index meta-data. Moreover, the same optimization framework can also optimize complex queries sent to a back-end DBMS by exposing hidden indexes for its query optimizer.

Altogether, Mexima is a complete and extensible platform for transparently index integration, utilization, and evaluation.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2016. p. 45
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1352
Keywords
Database indexing, query processing, index structures, main-memory, index validation
National Category
Computer Sciences
Research subject
Computer Science with specialization in Database Technology
Identifiers
urn:nbn:se:uu:diva-280374 (URN)978-91-554-9509-1 (ISBN)
Public defence
2016-05-04, 2446, ITC, Lägerhyddsvägen 2, Uppsala, Uppsala, 13:15 (English)
Opponent
Supervisors
Available from: 2016-04-13 Created: 2016-03-09 Last updated: 2018-01-10
2. Scalable Data Management for Internet of Things
Open this publication in new window or tab >>Scalable Data Management for Internet of Things
2021 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Internet of Things (IoT) often involve considerable numbers of sensors that produce large volumes of data. In this context, efficient management of data could potentially enable automatic decision making based on analytics of sensors on equipment. However, these sensors are often geographically distributed and generate diverse formats of data in form of sensor streams at a high rate. The combination of these properties of IoT pose significant challenges for the existing database management systems (DBMSs) to provide scalable data storage and analytics.

The problem of providing efficient data management of distributed IoT applications using DBMS technologies is addressed in this thesis. Initially, we developed a prototype system, Fused LOg database Query Processor (FLOQ), which enables general query processingover collections of relational databases that are deployed locally on distributed sites to store sensor measurement logs. Although FLOQ provides efficient query execution when scaling the number of distributed databases, it exhibits complexity and scalability issues for large IoT applications having heterogeneous data. The limitations of FLOQ are primarily inherent to its use of relational database backends for storage of sensor logs.

When a relational database is used to store large-scale IoT data, it exhibits several challenges. The loading of massive logs produced at high rates is not fast enough due to its strong consistency mechanisms. Furthermore, it could demonstrate a single point of failure that limits the availability, and the inflexible schemas make it difficult to manage heterogeneity. In contrast to relational databases, distributed NoSQL data stores could provide scalable storage of heterogeneous data through data partitioning, replication, and high availability by sacrificing strong consistency. To understand the suitability of NoSQL databases, this thesis also investigates to what degree NoSQL DBMSs provide scalable storage and analytics of IoT applications by comparing a variety of state-of-the-art relational and NoSQL databases for real-world industrial IoT data. 

The experimental evaluations reveal that the scalability can be provided by the distributed NoSQL data stores; however, the support of advanced data analytics is difficult due to their limited query processing capabilities. Furthermore, data management of distributed IoT applications often requires seamless integration between a real-time edge analytics platform, a distributed storage manager, effective data integration, and query processing techniques for handling heterogeneity. Therefore, in order to provide a holistic data management solution, this thesis developed the Extended Query Processing (EQP) system, which enables advanced analytics for supporting both edge and offline analytics for large-scale IoT applications.

These contributions enable efficient data management of large-scale heterogeneous IoT applications and supports advanced analytics.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2021. p. 44
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 2095
Keywords
NoSQL, IoT, Smart Computing, MongoDB, IIoT, Data Streams, Edge Computing
National Category
Computer Sciences Computer Systems
Research subject
Computer Science with specialization in Database Technology
Identifiers
urn:nbn:se:uu:diva-458420 (URN)978-91-513-1346-7 (ISBN)
Public defence
2022-01-14, Room 2446, Polacksbacken, Lägerhyddsvägen 2, Uppsala, 13:15 (English)
Opponent
Supervisors
Funder
eSSENCE - An eScience Collaboration
Available from: 2021-12-21 Created: 2021-11-22 Last updated: 2022-01-18Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Mahmood, KhalidTruong, ThanhRisch, Tore

Search in DiVA

By author/editor
Mahmood, KhalidTruong, ThanhRisch, Tore
By organisation
Computing Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 598 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf