Change search
ReferencesLink to record
Permanent link

Direct link
Impact of Cassandra Compaction on Dockerized Cassandra’s performance: Using Size Tiered Compaction Strategy
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
2016 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Context. Cassandra is a NoSQL Database which handles large amount of data simultaneously and provides high availability for the data present. Compaction in Cassandra is a process of removing stale data and making data more available to the user. This thesis focusses on analyzing the impact of Cassandra compaction on Cassandra’s performance when running inside a Docker container.

Objectives. In this thesis, we investigate the impact of Cassandra compaction on the database performance when it is used within a Docker based container platform. We further fine tune Cassandra’s compaction settings to arrive at a sub-optimal scenario which maximizes its performance while operating within a Docker.

Methods. Literature review is performed to enlist different compaction related metrics and compaction related parameters which have an effect on Cassandra’s performance. Further, Experiments are conducted using different sets of mixed workload to estimate the impact of compaction over database performance when used within a Docker. Once these experiments are conducted, we modify compaction settings while operating under a write heavy workload and access database performance in each of these scenarios to identify a sub-optimal value of parameter for maximum database performance. Finally, we use these sub-optimal parameters to perform an experiment and access the database performance.

Results. The Cassandra and Operating System related parameters and metrics which affect the Cassandra compaction are listed and their effect on Cassandra’s performance has been tested using some experiments. Based on these experiments, few sub-optimum values are proposed for the listed metrics.

Conclusions. It can be concluded that, for better performance of Dockerized Cassandra, the proposed values for each of the parameters in the results (i.e. 5120 for Memtable_heap_size_in_mb, 24 for concurrent_compactors, 16 for compaction_throughput_mb_per_sec, 6 for Memtable_flush_writers and 0.14 for Memtable_cleaup _threshold) can be chosen separately but not the union of those proposed values (confirmed from the experiment performed). Also the metrics and parameters affecting Cassandra performance are listed in this thesis. 

Place, publisher, year, edition, pages
2016. , 54 p.
Keyword [en]
Docker, Cassandra, Cassandra compaction, NoSQL database
National Category
Computer Science
URN: urn:nbn:se:bth-13273OAI: diva2:1040758
Subject / course
DV2566 Master's Thesis (120 credits) in Computer Science
Educational program
DVAXA Master of Science Programme in Computer Science
Available from: 2016-10-31 Created: 2016-10-28 Last updated: 2016-10-31Bibliographically approved

Open Access in DiVA

fulltext(1522 kB)81 downloads
File information
File name FULLTEXT02.pdfFile size 1522 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Mohanty, Biswajeet
By organisation
Department of Computer Science and Engineering
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 81 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 443 hits
ReferencesLink to record
Permanent link

Direct link