Compaction Strategies in Apache Cassandra: Analysis of Default Cassandra stress model
Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Context. The present trend in a large variety of applications are ranging from the web and social networking to telecommunications, is to gather and process very large and fast growing amounts of information leading to a common set of problems known collectively as “Big Data”. The ability to process large scale data analytics over large number of data sets in the last decade proved to be a competitive advantage in a wide range of industries like retail, telecom and defense etc. In response to this trend, the research community and the IT industry have proposed a number of platforms to facilitate large scale data analytics. Such platforms include a new class of databases, often refer to as NoSQL data stores. Apache Cassandra is a type of NoSQL data store. This research is focused on analyzing the performance of different compaction strategies in different use cases for default Cassandra stress model. Objectives. The performance of compaction strategies are observed in various scenarios on the basis of three use cases, Write heavy- 90/10, Read heavy- 10/90 and Balanced- 50/50. For a default Cassandra stress model, so as to finally provide the necessary events and specifications that suggest when to switch from one compaction strategy to another. Methods. Cassandra single node network is deployed on a web server and its behavior of read and write performance with different compaction strategies is studied with read heavy, write heavy and balanced workloads. Its performance metrics are collected and analyzed. Results. Performance metrics of different compaction strategies are evaluated and analyzed. Conclusions. With a detailed analysis and logical comparison, we finally conclude that Level Tiered Compaction Strategy performs better for a read heavy (10/90) workload while using default Cassandra stress model , as compared to size tiered compaction and date tiered compaction strategies. And for Balanced Date tiered compaction strategy performs better than size tiered compaction strategy and date tiered compaction strategy.
Place, publisher, year, edition, pages
2016. , 33 p.
Big data platforms, Cassandra, NosSQL database.
IdentifiersURN: urn:nbn:se:bth-12850OAI: oai:DiVA.org:bth-12850DiVA: diva2:946772
Subject / course
ET2580 Master's Thesis (120 credits) in Electrical Engineering with emphasis on Telecommunication Systems
ETATE Master of Science Programme in Electrical Engineering with emphasis on Telecommunication Systems
2016-05-31, J1640, Valhalavagen, 37141, Karlskrona, 13:00 (English)
Emiliano, Casalicchio, Senior Lecturer in Computer Science
Tutschku, Kurt, Professor. Dr.