Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
SQL on Hops
KTH, School of Information and Communication Technology (ICT).
2017 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

In today’s world data is extremely valuable. Companies and researchers store every sort of data, from users activities to medical records. However, data is useless if one cannot extract meaning and insight from it. In 2004 Dean and Ghemawat introduced the MapReduce framework. This sparked the development of open source frameworks for big data storage (HDFS) and processing (Hadoop). Hops and Apache Hive build on top of this heritage. The former proposes a new distributed file system which achieves higher scalability and throughput by storing metadata in a database called MySQL-Cluster. The latter is an open source data warehousing solution built on top of the Hadoop ecosystems, which allows users to query big data stored on HDFS using a SQL-like query language.Apache Hive is a widely used and mature project, however it lacks of consistency between the data stored on the file system and the metadata describing it, stored on a relational database. This means that if users delete Hive’s data from the file system, Hive does not delete the related metadata. This causes two issues: (1) users do not get an error if the data is missing from the filesystem (2) if users forget to delete the metadata, it will become orphaned in the database. In this thesis we exploit the fact that both HopsFS’ metadata and Hive’s metadata is stored in a relational database, to provide a mechanisms to automatically delete Hive’s metadata if the data is delete from the file system.The second objective of this thesis is to integrate Apache Hive into the Hops ecosystem and in particular in the HopsWorks platform. HopsWorks is a multitenant, UI based service which allows users to store and process big data projects. In this thesis we develop a custom authenticator for Hive to allow HopsWorks users to authenticate with Hive and to integrate with its security model.

Place, publisher, year, edition, pages
2017. , p. 50
Series
TRITA-ICT-EX ; 2017:146
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-215692OAI: oai:DiVA.org:kth-215692DiVA, id: diva2:1149002
Subject / course
Computer Science
Educational program
Master of Science -Communication Systems
Supervisors
Examiners
Available from: 2017-10-13 Created: 2017-10-13 Last updated: 2018-01-13Bibliographically approved

Open Access in DiVA

fulltext(895 kB)66 downloads
File information
File name FULLTEXT01.pdfFile size 895 kBChecksum SHA-512
df5c369e8f31a5bc5716c8bb60c0fa4125970cce5b673a0e3241745209207919ef5ff1ac0d5c8ad8f9adfd02e8c55f47628aef7b04ff43d6c302a097824843e7
Type fulltextMimetype application/pdf

By organisation
School of Information and Communication Technology (ICT)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 66 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 159 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf