uu.seUppsala University Publications
Change search
Refine search result
1 - 5 of 5
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1. Mahmood, Khalid
    et al.
    Risch, Tore
    Zhu, Minpeng
    Utilizing a NoSQL Data Store for Scalable Log Analysis2015Conference paper (Refereed)
    Abstract [en]

    A potential problem for persisting large volume of data logs with a conventional relational database is that loading massive logs produced at high rates is not fast enough due to the strong consistency model and high cost of indexing. As a possible alternative, a modern NoSQL data store, which sacrifices transactional consistency to achieve higher performance and scalability, can be utilized. In this paper, we investigate to what degree a state-of-the-art NoSQL database can achieve high performance persisting and fundamental analyses of large-scale data logs from real world applications. For the evaluation, a state-of-the-art NoSQL database, MongoDB, is compared with a relational DBMS from a major commercial vendor and with a popular open source relational DBMS. MongoDB is chosen as it provides both primary and secondary indexing compared to other popular NoSQL systems. These indexing techniques are essential for scalable processing of queries over large scale data logs. To explore the impact of parallelism on query execution, sharding was investigated for MongoDB. Our results revealed that relaxing the consistency did not provide substantial performance enhancement in persisting large-scale data logs for any of the systems. However, for high-performance loading and analysis of data logs, MongoDB is shown to be a viable alternative compared to relational databases for queries where the choice of an optimal execution plan is not critical.

  • 2.
    Zhu, Minpeng
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computing Science. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science.
    Scalable Queries over Log Database Collections2016Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    In industrial settings, machines such as trucks, hydraulic pumps, etc. are widely distributed at different geographic locations where sensors on machines produce large volumes of data. The data produced is stored locally in autonomous databases called log databases. The collection of log databases is dynamically changing when new sites are dynamically added or removed from the federation.

    In this application context, an efficient way to search and analyze passed behavior of products in use is desired. To enable scalable queries over collections of distributed and autonomous log databases we developed the FLOQ (Fused LOg database Query processor) system, which provides a global view of the working status of all machines on the sites through a meta-database integrating the dynamic log database collection. A particular challenge in this scenario is a scalable way to process numerical queries that identify anomalies by joining data from the meta-database with data selected from the collection of distributed and autonomous log databases. The Thesis describes the architecture of FLOQ. In particular different strategies to execute numerical queries over log database collections are investigated. FLOQ allows both the meta-database and the log databases to be stored in multiple formats using different kinds of data managers. FLOQ provides general and extensible mechanisms for efficient processing of queries over different kinds of distributed data sources.

    List of papers
    1. Querying Combined Cloud-Based and Relational Databases
    Open this publication in new window or tab >>Querying Combined Cloud-Based and Relational Databases
    2011 (English)Conference paper, Published paper (Refereed)
    Abstract [en]

    An increasing amount of data is stored in cloud repositories, which provide high availability, accessibility, and scalability. However, for security reasons enterprises often need to store the core proprietary data in their own relational databases, while common data to be widely available can be stored in a cloud data repository. For example, the subsidiaries of a global enterprise are located in different geographic places where each subsidiary is likely to maintain its own local database. In such a scenario, data integration among the local databases and the cloud-based data is inevitable. We have developed a system called BigIntegrator to enable general queries that combine data in cloud-based data stores with relational databases. We present the design and working principle of the system. A scenario of querying data from both kinds of data sources is used as illustration. The system is general and extensible to integrate data from different kinds of data sources. A particular challenge being addressed is the limited query capabilities of cloud data stores. BigIntegrator utilizes knowledge of those limitations to produce efficient query execution.

    Keywords
    cloud data repository; relational database; data integration; Bigtable;
    National Category
    Computer and Information Sciences
    Identifiers
    urn:nbn:se:uu:diva-275026 (URN)10.1109/CSC.2011.6138543 (DOI)
    Conference
    2011 International Conference on Cloud and Service Computing (CSC)
    Available from: 2016-01-28 Created: 2016-01-28 Last updated: 2018-01-10
    2. Scalable Numerical SPARQL Queries over Relational Databases
    Open this publication in new window or tab >>Scalable Numerical SPARQL Queries over Relational Databases
    2014 (English)Conference paper, Published paper (Refereed)
    Abstract [en]

    We present an approach for scalable processing of SPARQL queries to RDF views of numerical data stored in relational databases (RDBs). Such queries include numerical expressions, inequalities, comparisons, etc. inside FILTERs. We call such FILTERs numerical expressions and the queries - numerical SPARQL queries. For scalable execution of numerical SPARQL queries over RDBs, numerical operators should be pushed into SQL rather than executing the filters as post-processing outside the RDB; otherwise the query execution is slowed down, since a lot of data is transported from the RDB server and furthermore indexes on the server are not utilized. The NUMTranslator algorithm converts numerical expressions in numerical SPARQL queries into corresponding SQL expressions. We show that NUMTranslator improves substantially the scalability of SPARQL queries based on a benchmark that analyses numerical logs stored in an RDB. We compared the performance of our approach with the performance of other systems processing SPARQL queries to RDF views of RDBs and show that NUMTranslator improves substantially the scalability of numerical queries compared to the other systems’ approaches.

    National Category
    Computer and Information Sciences
    Identifiers
    urn:nbn:se:uu:diva-275027 (URN)
    Conference
    4th International workshop on linked web data management (LWDM 2014) in conjunction with the EDBT/ICDT 2014 Joint Conference, Ath-ens, Greece, March 28, 2014
    Available from: 2016-01-28 Created: 2016-01-28 Last updated: 2018-01-10
    3. Scalable queries over log database collections
    Open this publication in new window or tab >>Scalable queries over log database collections
    2015 (English)In: Data Science, Springer, 2015, p. 173-185Conference paper, Published paper (Refereed)
    Place, publisher, year, edition, pages
    Springer, 2015
    Series
    Lecture Notes in Computer Science, ISSN 0302-9743 ; 9147
    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-274784 (URN)10.1007/978-3-319-20424-6_17 (DOI)000364104600017 ()978-3-319-20423-9 (ISBN)
    Conference
    BICOD 2015, July 6–8, Edinburgh, UK
    Available from: 2015-06-11 Created: 2016-01-26 Last updated: 2018-01-10Bibliographically approved
    4. Utilizing a NoSQL Data Store for Scalable Log Analysis
    Open this publication in new window or tab >>Utilizing a NoSQL Data Store for Scalable Log Analysis
    2015 (English)Conference paper, Published paper (Refereed)
    Abstract [en]

    A potential problem for persisting large volume of data logs with a conventional relational database is that loading massive logs produced at high rates is not fast enough due to the strong consistency model and high cost of indexing. As a possible alternative, a modern NoSQL data store, which sacrifices transactional consistency to achieve higher performance and scalability, can be utilized. In this paper, we investigate to what degree a state-of-the-art NoSQL database can achieve high performance persisting and fundamental analyses of large-scale data logs from real world applications. For the evaluation, a state-of-the-art NoSQL database, MongoDB, is compared with a relational DBMS from a major commercial vendor and with a popular open source relational DBMS. MongoDB is chosen as it provides both primary and secondary indexing compared to other popular NoSQL systems. These indexing techniques are essential for scalable processing of queries over large scale data logs. To explore the impact of parallelism on query execution, sharding was investigated for MongoDB. Our results revealed that relaxing the consistency did not provide substantial performance enhancement in persisting large-scale data logs for any of the systems. However, for high-performance loading and analysis of data logs, MongoDB is shown to be a viable alternative compared to relational databases for queries where the choice of an optimal execution plan is not critical.

    National Category
    Computer and Information Sciences
    Identifiers
    urn:nbn:se:uu:diva-275028 (URN)10.1145/2790755.2790772 (DOI)978-1-4503-3414-3 (ISBN)
    Conference
    19th International Database Engineering & Applications Symposium (IDEAS 2015), Yokohama, Japan, July 13-15, 2015
    Available from: 2016-01-28 Created: 2016-01-28 Last updated: 2018-01-10
  • 3.
    Zhu, Minpeng
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science.
    Mahmood, Khalid
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science.
    Risch, Tore
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science.
    Scalable queries over log database collections2015In: Data Science, Springer, 2015, p. 173-185Conference paper (Refereed)
  • 4. Zhu, Minpeng
    et al.
    Risch, Tore
    Querying Combined Cloud-Based and Relational Databases2011Conference paper (Refereed)
    Abstract [en]

    An increasing amount of data is stored in cloud repositories, which provide high availability, accessibility, and scalability. However, for security reasons enterprises often need to store the core proprietary data in their own relational databases, while common data to be widely available can be stored in a cloud data repository. For example, the subsidiaries of a global enterprise are located in different geographic places where each subsidiary is likely to maintain its own local database. In such a scenario, data integration among the local databases and the cloud-based data is inevitable. We have developed a system called BigIntegrator to enable general queries that combine data in cloud-based data stores with relational databases. We present the design and working principle of the system. A scenario of querying data from both kinds of data sources is used as illustration. The system is general and extensible to integrate data from different kinds of data sources. A particular challenge being addressed is the limited query capabilities of cloud data stores. BigIntegrator utilizes knowledge of those limitations to produce efficient query execution.

  • 5. Zhu, Minpeng
    et al.
    Stefanova, Silvia
    Truong, Thanh
    Risch, Tore
    Scalable Numerical SPARQL Queries over Relational Databases2014Conference paper (Refereed)
    Abstract [en]

    We present an approach for scalable processing of SPARQL queries to RDF views of numerical data stored in relational databases (RDBs). Such queries include numerical expressions, inequalities, comparisons, etc. inside FILTERs. We call such FILTERs numerical expressions and the queries - numerical SPARQL queries. For scalable execution of numerical SPARQL queries over RDBs, numerical operators should be pushed into SQL rather than executing the filters as post-processing outside the RDB; otherwise the query execution is slowed down, since a lot of data is transported from the RDB server and furthermore indexes on the server are not utilized. The NUMTranslator algorithm converts numerical expressions in numerical SPARQL queries into corresponding SQL expressions. We show that NUMTranslator improves substantially the scalability of SPARQL queries based on a benchmark that analyses numerical logs stored in an RDB. We compared the performance of our approach with the performance of other systems processing SPARQL queries to RDF views of RDBs and show that NUMTranslator improves substantially the scalability of numerical queries compared to the other systems’ approaches.

1 - 5 of 5
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf