Utilizing a NoSQL Data Store for Scalable Log Analysis
2015 (English)Conference paper (Refereed)
A potential problem for persisting large volume of data logs with a conventional relational database is that loading massive logs produced at high rates is not fast enough due to the strong consistency model and high cost of indexing. As a possible alternative, a modern NoSQL data store, which sacrifices transactional consistency to achieve higher performance and scalability, can be utilized. In this paper, we investigate to what degree a state-of-the-art NoSQL database can achieve high performance persisting and fundamental analyses of large-scale data logs from real world applications. For the evaluation, a state-of-the-art NoSQL database, MongoDB, is compared with a relational DBMS from a major commercial vendor and with a popular open source relational DBMS. MongoDB is chosen as it provides both primary and secondary indexing compared to other popular NoSQL systems. These indexing techniques are essential for scalable processing of queries over large scale data logs. To explore the impact of parallelism on query execution, sharding was investigated for MongoDB. Our results revealed that relaxing the consistency did not provide substantial performance enhancement in persisting large-scale data logs for any of the systems. However, for high-performance loading and analysis of data logs, MongoDB is shown to be a viable alternative compared to relational databases for queries where the choice of an optimal execution plan is not critical.
Place, publisher, year, edition, pages
2015. 49-55 p.
Computer and Information Science
IdentifiersURN: urn:nbn:se:uu:diva-275028DOI: 10.1145/2790755.2790772ISBN: 978-1-4503-3414-3OAI: oai:DiVA.org:uu-275028DiVA: diva2:898481
19th International Database Engineering & Applications Symposium (IDEAS 2015), Yokohama, Japan, July 13-15, 2015