Change search
ReferencesLink to record
Permanent link

Direct link
Compactions in Apache Cassandra: Performance Analysis of Compaction Strategies in Apache Cassandra
Blekinge Institute of Technology, Faculty of Computing, Department of Communication Systems.
2016 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Context: The global communication system is in a tremendous growth, leading to wide range of data generation. The Telecom operators in various Telecom Industries, that generate large amount of data has a need to manage these data efficiently. As the technology involved in the database management systems is increasing, there is a remarkable growth of NoSQL databases in the 20th century. Apache Cassandra is an advanced NoSQL database system, which is popular for handling semi-structured and unstructured format of Big Data. Cassandra has an effective way of compressing data by using different compaction strategies. This research is focused on analyzing the performances of different compaction strategies in different use cases for default Cassandra stress model. The analysis can suggest better usage of compaction strategies in Cassandra, for a write heavy workload.

Objectives: In this study, we investigate the appropriate performance metrics to evaluate the performance of compaction strategies. We provide the detailed analysis of Size Tiered Compaction Strategy, Date Tiered Compaction Strategy, and Leveled Compaction Strategy for a write heavy (90/10) work load, using default cassandra stress tool.

Methods: A detailed literature research has been conducted to study the NoSQL databases, and the working of different compaction strategies in Apache Cassandra. The performances metrics are considered by the understanding of the literature research conducted, and considering the opinions of supervisors and Ericsson’s Apache Cassandra team. Two different tools were developed for collecting the performances of the considered metrics. The first tool was developed using Jython scripting language to collect the cassandra metrics, and the second tool was developed using python scripting language to collect the Operating System metrics. The graphs have been generated in Microsoft Excel, using the values obtained from the scripts.

Results: Date Tiered Compaction Strategy and Size Tiered Compaction strategy showed more or less similar behaviour during the stress tests conducted. Level Tiered Compaction strategy has showed some remarkable results that effected the system performance, as compared to date tiered compaction and size tiered compaction strategies. Date tiered compaction strategy does not perform well for default cassandra stress model. Size tiered compaction can be preferred for default cassandra stress model, but not considerable for big data.

Conclusions: With a detailed analysis and logical comparison of metrics, we finally conclude that Level Tiered Compaction Strategy performs better for a write heavy (90/10) workload while using default cassandra stress model, as compared to size tiered compaction and date tiered compaction strategies.

Place, publisher, year, edition, pages
2016. , 30 p.
Keyword [en]
Apache Cassandra, Compaction Strategies, Default Cassandra Stress model, Performance, NoSQL Database
National Category
URN: urn:nbn:se:bth-12885OAI: diva2:948190
External cooperation
Ericsson, Karlskrona
Subject / course
ET2580 Master's Thesis (120 credits) in Electrical Engineering with emphasis on Telecommunication Systems
Educational program
ETATE Master of Science Programme in Electrical Engineering with emphasis on Telecommunication Systems
2016-05-31, J1640, Valhallavagen, 37141, Karlskrona, 11:15 (English)
Available from: 2016-08-08 Created: 2016-07-08 Last updated: 2016-08-08Bibliographically approved

Open Access in DiVA

fulltext(1181 kB)44 downloads
File information
File name FULLTEXT02.pdfFile size 1181 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
Department of Communication Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 44 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 100 hits
ReferencesLink to record
Permanent link

Direct link