Analyzing the impact of data compression in Hive
Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Executing expensive queries over many large tables can be prohibitively time consuming in conventional relational databases. Hadoop and its data warehouse Hive is a powerful alternative for large scale data processing. Conventionally, data is stored in Hive without compression. There is value in storing the data with compression, if the overhead of compression does not negatively impact the query processing time. This paper describes through experiments using imports, transformations and exports of Hive data in various file formats and with different compression techniques how this can be achieved.
Place, publisher, year, edition, pages
2014. , 36 p.
Engineering and Technology
IdentifiersURN: urn:nbn:se:uu:diva-269235OAI: oai:DiVA.org:uu-269235DiVA: diva2:882559
Bachelor Programme in Computer Science
Stefanova, SilviaGällmo, Olle