uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Utilizing a NoSQL Data Store for Scalable Log Analysis
(UDBL)
(UDBL)
2015 (English)Conference paper, Published paper (Refereed)
Abstract [en]

A potential problem for persisting large volume of data logs with a conventional relational database is that loading massive logs produced at high rates is not fast enough due to the strong consistency model and high cost of indexing. As a possible alternative, a modern NoSQL data store, which sacrifices transactional consistency to achieve higher performance and scalability, can be utilized. In this paper, we investigate to what degree a state-of-the-art NoSQL database can achieve high performance persisting and fundamental analyses of large-scale data logs from real world applications. For the evaluation, a state-of-the-art NoSQL database, MongoDB, is compared with a relational DBMS from a major commercial vendor and with a popular open source relational DBMS. MongoDB is chosen as it provides both primary and secondary indexing compared to other popular NoSQL systems. These indexing techniques are essential for scalable processing of queries over large scale data logs. To explore the impact of parallelism on query execution, sharding was investigated for MongoDB. Our results revealed that relaxing the consistency did not provide substantial performance enhancement in persisting large-scale data logs for any of the systems. However, for high-performance loading and analysis of data logs, MongoDB is shown to be a viable alternative compared to relational databases for queries where the choice of an optimal execution plan is not critical.

Place, publisher, year, edition, pages
2015. 49-55 p.
National Category
Computer and Information Science
Identifiers
URN: urn:nbn:se:uu:diva-275028DOI: 10.1145/2790755.2790772ISBN: 978-1-4503-3414-3 (print)OAI: oai:DiVA.org:uu-275028DiVA: diva2:898481
Conference
19th International Database Engineering & Applications Symposium (IDEAS 2015), Yokohama, Japan, July 13-15, 2015
Available from: 2016-01-28 Created: 2016-01-28 Last updated: 2016-03-09
In thesis
1. Scalable Queries over Log Database Collections
Open this publication in new window or tab >>Scalable Queries over Log Database Collections
2016 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

In industrial settings, machines such as trucks, hydraulic pumps, etc. are widely distributed at different geographic locations where sensors on machines produce large volumes of data. The data produced is stored locally in autonomous databases called log databases. The collection of log databases is dynamically changing when new sites are dynamically added or removed from the federation.

In this application context, an efficient way to search and analyze passed behavior of products in use is desired. To enable scalable queries over collections of distributed and autonomous log databases we developed the FLOQ (Fused LOg database Query processor) system, which provides a global view of the working status of all machines on the sites through a meta-database integrating the dynamic log database collection. A particular challenge in this scenario is a scalable way to process numerical queries that identify anomalies by joining data from the meta-database with data selected from the collection of distributed and autonomous log databases. The Thesis describes the architecture of FLOQ. In particular different strategies to execute numerical queries over log database collections are investigated. FLOQ allows both the meta-database and the log databases to be stored in multiple formats using different kinds of data managers. FLOQ provides general and extensible mechanisms for efficient processing of queries over different kinds of distributed data sources.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2016. 51 p.
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1343
National Category
Computer Science
Research subject
Computer Science with specialization in Database Technology
Identifiers
urn:nbn:se:uu:diva-275044 (URN)978-91-554-9472-8 (ISBN)
Public defence
2016-03-30, 2446, Department of Information Technology, Polacksbacken (Lägerhyddsvägen 2), Uppsala, 13:00 (English)
Opponent
Supervisors
Available from: 2016-03-03 Created: 2016-01-28 Last updated: 2016-03-03Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Authority records BETA

Zhu, Minpeng

Search in DiVA

By author/editor
Zhu, Minpeng
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 281 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf