uu.seUppsala universitets publikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Optimisation of Ad-hoc analysis of an OLAP cube using SparkSQL
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för beräkningsvetenskap.
2017 (engelsk)Independent thesis Advanced level (professional degree), 20 poäng / 30 hpOppgave
Abstract [en]

An Online Analytical Processing (OLAP) cube is a way to represent a multidimensional database. The multidimensional database often uses a star schema and populates it with the data from a relational database. The purpose of using an OLAP cube is usually to find valuable insights in the data like trends or unexpected data and is therefore often used within Business intelligence (BI). Mondrian is a tool that handles OLAP cubes that uses the query language MultiDimensional eXpressions (MDX) and translates it to SQL queries. Apache Kylin is an engine that can be used with Apache Hadoop to create and query OLAP cubes with an SQL interface. This thesis investigates whether the engine Apache Spark running on a Hadoop cluster is suitable for analysing OLAP cubes and what performance that can be expected. The Star Schema Benchmark (SSB) has been used to provide Ad-Hoc queries and to create a large database containing over 1.2 billion rows. This database was created in a cluster in the Omicron office consisting of five worker nodes and one master node. Queries were then sent to the database using Mondrian integrated into the BI platform Pentaho. Amazon Web Services (AWS) has also been used to create clusters with 3, 6 and 15 slaves to see how the performance scales. Creating a cube in Apache Kylin on the Omicron cluster was also tried, but was not possible due to the cluster running out of memory. The results show that it took between 8.2 to 11.9 minutes to run the MDX queries on the Omicron cluster. On both the Omicron cluster and the AWS cluster, the SQL queries ran faster than the MDX queries. The AWS cluster ran the queries faster than the Omicron cluster, even though fewer nodes were used. It was also noted that the AWS cluster did not scale linearly, neither for the MDX nor the SQL queries.

sted, utgiver, år, opplag, sider
2017. , s. 55
Serie
UPTEC X ; 17 007
HSV kategori
Identifikatorer
URN: urn:nbn:se:uu:diva-329938OAI: oai:DiVA.org:uu-329938DiVA, id: diva2:1143790
Utdanningsprogram
Molecular Biotechnology Engineering Programme
Veileder
Examiner
Tilgjengelig fra: 2017-09-28 Laget: 2017-09-22 Sist oppdatert: 2018-01-13bibliografisk kontrollert

Open Access i DiVA

fulltext(1935 kB)550 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 1935 kBChecksum SHA-512
af0d20e45c4982186133bebf70e67767925cb1cb325a9bb1614d486e63c4a4894b2f1e9f6275df1190b04675fff2a674e1117d9e959906c3297d89bbe94621dd
Type fulltextMimetype application/pdf

Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 550 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

urn-nbn

Altmetric

urn-nbn
Totalt: 739 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf