uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Main-Memory Query Processing Utilizing External Indexes
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computing Science. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science. (UDBL)
2016 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Many applications require storage and indexing of new kinds of data in main-memory, e.g. color histograms, textures, shape features, gene sequences, sensor readings, or financial time series. Even though, many domain index structures were developed, very a few of them are implemented in any database management system (DBMS), usually only B-trees and hash indexes. A major reason is that the manual effort to include a new index implementation in a regular DBMS is very costly and time-consuming because it requires integration with all components of the DBMS kernel. To alleviate this, there are some extensible indexing frameworks. However, they all require re-engineering the index implementations, which is a problem when the index has third-party ownership, when only binary code is available, or simply when the index implementation is complex to re-engineer. Therefore, the DBMS should allow including new index implementations without code changes and performance degradation. Furthermore, for high performance the query processor needs knowledge of how to process queries to utilize plugged-in index. Moreover, it is important that all functionalities of a plugged-in index implementation are correct.

The extensible main memory database system (MMDB) Mexima (Main-memory External Index Manager) addresses these challenges. It enables transparent plugging in main-memory index implementations without code changes. Index specific rewrite rules transform complex queries to utilize the indexes. Automatic test procedures validate the correctness of them based on user provided index meta-data. Moreover, the same optimization framework can also optimize complex queries sent to a back-end DBMS by exposing hidden indexes for its query optimizer.

Altogether, Mexima is a complete and extensible platform for transparently index integration, utilization, and evaluation.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2016. , 45 p.
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1352
Keyword [en]
Database indexing, query processing, index structures, main-memory, index validation
National Category
Computer Science
Research subject
Computer Science with specialization in Database Technology
Identifiers
URN: urn:nbn:se:uu:diva-280374ISBN: 978-91-554-9509-1 (print)OAI: oai:DiVA.org:uu-280374DiVA: diva2:910664
Public defence
2016-05-04, 2446, ITC, Lägerhyddsvägen 2, Uppsala, Uppsala, 13:15 (English)
Opponent
Supervisors
Available from: 2016-04-13 Created: 2016-03-09 Last updated: 2016-04-21
List of papers
1. Transparent inclusion, utilization, and validation of main memory domain indexes
Open this publication in new window or tab >>Transparent inclusion, utilization, and validation of main memory domain indexes
2015 (English)In: Proc. 27th International Conference on Scientific and Statistical Database Management, New York: ACM Press, 2015Conference paper, Published paper (Refereed)
Abstract [en]

Main-memory database systems (MMDBs) are viable solutions for many scientific applications. Scientific and engineering data often require special indexing methods, and there is a large number of domain specific main memory indexing implementations developed. However, adding an index structure into a database system can be challenging. Mexima (Main memory External Index Manager) provides an MMDB where new main-memory index structures can be plugged-in without modifying the index implementations. This has allowed to plug into Mexima complex and highly optimized index structures implemented in C/C++ without code changes. To utilize new user defined indexes in queries transparently, Mexima automatically transforms query fragments into index operations based on index properly tables containing index meta-data. For scalable processing of complex numerical query expressions, Mexima includes an algebraic query transformation mechanism that reasons on numerical expressions to expose potential utilization of indexes. The index property tables furthermore enable validating the correctness of an index implementation by executing automatically generated test queries based on index meta-data. Experiments show that the performance penalty of using an index plugged into Mexima is low compared to using the corresponding stand-alone C/C++ implementation. Substantial performance gains are shown by the index exposing rewrite mechanisms.

Place, publisher, year, edition, pages
New York: ACM Press, 2015
Keyword
Domain Indexing; Extensible Databases; Query Processing; Automatic Testing
National Category
Computer Science
Research subject
Computer Science with specialization in Database Technology
Identifiers
urn:nbn:se:uu:diva-280368 (URN)10.1145/2791347.2791375 (DOI)978-1-4503-3709-0 (ISBN)
Conference
SSDBM 2015, June 29–July 1, San Diego, CA
Available from: 2015-06-29 Created: 2016-03-09 Last updated: 2016-10-07Bibliographically approved
2. Scalable Numerical Queries by Algebraic Inequality Transformations
Open this publication in new window or tab >>Scalable Numerical Queries by Algebraic Inequality Transformations
2014 (English)In: Database Systems for Advanced Applications, Dasfaa 2014, PT I, 2014, 95-109 p.Conference paper, Published paper (Refereed)
Abstract [en]

To enable historical analyses of logged data streams by SQL queries, the Stream Log Analysis System (SLAS) bulk loads data streams derived from sensor readings into a relational database system. SQL queries over such log data often involve numerical conditions containing inequalities, e. g. to find suspected deviations from normal behavior based on some function over measured sensor values. However, such queries are often slow to execute, because the query optimizer is unable to utilize ordered indexed attributes inside numerical conditions. In order to speed up the queries they need to be reformulated to utilize available indexes. In SLAS the query transformation algorithm AQIT (Algebraic Query Inequality Transformation) automatically transforms SQL queries involving a class of algebraic inequalities into more scalable SQL queries utilizing ordered indexes. The experimental results show that the queries execute substantially faster by a commercial DBMS when AQIT has been applied to preprocess them.

Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 8421
National Category
Computer and Information Science
Identifiers
urn:nbn:se:uu:diva-236268 (URN)000342909200007 ()978-3-319-05810-8; 978-3-319-05809-2 (ISBN)
Conference
19th International Conference on Database Systems for Advanced Applications (DASFAA), APR 21-24, 2014, Bali, INDONESIA
Available from: 2014-11-18 Created: 2014-11-17 Last updated: 2016-04-15
3. Scalable Numerical SPARQL Queries over Relational Databases
Open this publication in new window or tab >>Scalable Numerical SPARQL Queries over Relational Databases
2014 (English)Conference paper, Published paper (Refereed)
Abstract [en]

We present an approach for scalable processing of SPARQL queries to RDF views of numerical data stored in relational databases (RDBs). Such queries include numerical expressions, inequalities, comparisons, etc. inside FILTERs. We call such FILTERs numerical expressions and the queries - numerical SPARQL queries. For scalable execution of numerical SPARQL queries over RDBs, numerical operators should be pushed into SQL rather than executing the filters as post-processing outside the RDB; otherwise the query execution is slowed down, since a lot of data is transported from the RDB server and furthermore indexes on the server are not utilized. The NUMTranslator algorithm converts numerical expressions in numerical SPARQL queries into corresponding SQL expressions. We show that NUMTranslator improves substantially the scalability of SPARQL queries based on a benchmark that analyses numerical logs stored in an RDB. We compared the performance of our approach with the performance of other systems processing SPARQL queries to RDF views of RDBs and show that NUMTranslator improves substantially the scalability of numerical queries compared to the other systems’ approaches.

National Category
Computer and Information Science
Identifiers
urn:nbn:se:uu:diva-275027 (URN)
Conference
4th International workshop on linked web data management (LWDM 2014) in conjunction with the EDBT/ICDT 2014 Joint Conference, Ath-ens, Greece, March 28, 2014
Available from: 2016-01-28 Created: 2016-01-28 Last updated: 2016-04-15
4. Grand challenge: Implementation by frequently emitting parallel windows and user-defined aggregate functions
Open this publication in new window or tab >>Grand challenge: Implementation by frequently emitting parallel windows and user-defined aggregate functions
Show others...
2013 (English)In: Proc. 7th ACM International Conference on Distributed Event-Based Systems, New York: ACM Press, 2013, 325-330 p.Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
New York: ACM Press, 2013
National Category
Computer Science
Identifiers
urn:nbn:se:uu:diva-211954 (URN)10.1145/2488222.2488284 (DOI)978-1-4503-1758-0 (ISBN)
External cooperation:
Conference
DEBS 2013
Available from: 2013-06-29 Created: 2013-12-03 Last updated: 2016-09-09Bibliographically approved
5. NoSQL approach to large scale analysis of persisted streams
Open this publication in new window or tab >>NoSQL approach to large scale analysis of persisted streams
2015 (English)In: Data Science, Springer, 2015, 152-156 p.Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Springer, 2015
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 9147
National Category
Computer Science
Identifiers
urn:nbn:se:uu:diva-274783 (URN)10.1007/978-3-319-20424-6_15 (DOI)000364104600015 ()978-3-319-20423-9 (ISBN)
Conference
BICOD 2015, July 6–8, Edinburgh, UK
Available from: 2015-06-11 Created: 2016-01-26 Last updated: 2016-04-15Bibliographically approved

Open Access in DiVA

fulltext(925 kB)146 downloads
File information
File name FULLTEXT01.pdfFile size 925 kBChecksum SHA-512
d6aae70f5f4e3fe91a23f9453cea0e8090d781e50a72b08aa633c26b01ff23a5d6f2437930b8c1904f323713450d80ed7750c102671590cc4aab9b53f8c74698
Type fulltextMimetype application/pdf
Buy this publication >>

Authority records BETA

Truong, Thanh

Search in DiVA

By author/editor
Truong, Thanh
By organisation
Division of Computing ScienceComputing Science
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 146 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 687 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf