Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Optimisation of Ad-hoc analysis of an OLAP cube using SparkSQL
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för beräkningsvetenskap.
2017 (Engelska)Självständigt arbete på avancerad nivå (yrkesexamen), 20 poäng / 30 hpStudentuppsats (Examensarbete)
Abstract [en]

An Online Analytical Processing (OLAP) cube is a way to represent a multidimensional database. The multidimensional database often uses a star schema and populates it with the data from a relational database. The purpose of using an OLAP cube is usually to find valuable insights in the data like trends or unexpected data and is therefore often used within Business intelligence (BI). Mondrian is a tool that handles OLAP cubes that uses the query language MultiDimensional eXpressions (MDX) and translates it to SQL queries. Apache Kylin is an engine that can be used with Apache Hadoop to create and query OLAP cubes with an SQL interface. This thesis investigates whether the engine Apache Spark running on a Hadoop cluster is suitable for analysing OLAP cubes and what performance that can be expected. The Star Schema Benchmark (SSB) has been used to provide Ad-Hoc queries and to create a large database containing over 1.2 billion rows. This database was created in a cluster in the Omicron office consisting of five worker nodes and one master node. Queries were then sent to the database using Mondrian integrated into the BI platform Pentaho. Amazon Web Services (AWS) has also been used to create clusters with 3, 6 and 15 slaves to see how the performance scales. Creating a cube in Apache Kylin on the Omicron cluster was also tried, but was not possible due to the cluster running out of memory. The results show that it took between 8.2 to 11.9 minutes to run the MDX queries on the Omicron cluster. On both the Omicron cluster and the AWS cluster, the SQL queries ran faster than the MDX queries. The AWS cluster ran the queries faster than the Omicron cluster, even though fewer nodes were used. It was also noted that the AWS cluster did not scale linearly, neither for the MDX nor the SQL queries.

Ort, förlag, år, upplaga, sidor
2017. , 55 s.
Serie
UPTEC X, 17 007
Nationell ämneskategori
Systemvetenskap, informationssystem och informatik
Identifikatorer
URN: urn:nbn:se:uu:diva-329938OAI: oai:DiVA.org:uu-329938DiVA: diva2:1143790
Utbildningsprogram
Civilingenjörsprogrammet i molekylär bioteknik
Handledare
Examinatorer
Tillgänglig från: 2017-09-28 Skapad: 2017-09-22 Senast uppdaterad: 2017-09-28Bibliografiskt granskad

Open Access i DiVA

fulltext(1935 kB)25 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 1935 kBChecksumma SHA-512
af0d20e45c4982186133bebf70e67767925cb1cb325a9bb1614d486e63c4a4894b2f1e9f6275df1190b04675fff2a674e1117d9e959906c3297d89bbe94621dd
Typ fulltextMimetyp application/pdf

Av organisationen
Avdelningen för beräkningsvetenskap
Systemvetenskap, informationssystem och informatik

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 25 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

Totalt: 94 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf