Impact of Cassandra Compaction on Dockerized Cassandra’s performance: Using Size Tiered Compaction Strategy
Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Context. Cassandra is a NoSQL Database which handles large amount of data simultaneously and provides high availability for the data present. Compaction in Cassandra is a process of removing stale data and making data more available to the user. This thesis focusses on analyzing the impact of Cassandra compaction on Cassandra’s performance when running inside a Docker container.
Objectives. In this thesis, we investigate the impact of Cassandra compaction on the database performance when it is used within a Docker based container platform. We further fine tune Cassandra’s compaction settings to arrive at a sub-optimal scenario which maximizes its performance while operating within a Docker.
Methods. Literature review is performed to enlist different compaction related metrics and compaction related parameters which have an effect on Cassandra’s performance. Further, Experiments are conducted using different sets of mixed workload to estimate the impact of compaction over database performance when used within a Docker. Once these experiments are conducted, we modify compaction settings while operating under a write heavy workload and access database performance in each of these scenarios to identify a sub-optimal value of parameter for maximum database performance. Finally, we use these sub-optimal parameters to perform an experiment and access the database performance.
Results. The Cassandra and Operating System related parameters and metrics which affect the Cassandra compaction are listed and their effect on Cassandra’s performance has been tested using some experiments. Based on these experiments, few sub-optimum values are proposed for the listed metrics.
Conclusions. It can be concluded that, for better performance of Dockerized Cassandra, the proposed values for each of the parameters in the results (i.e. 5120 for Memtable_heap_size_in_mb, 24 for concurrent_compactors, 16 for compaction_throughput_mb_per_sec, 6 for Memtable_flush_writers and 0.14 for Memtable_cleaup _threshold) can be chosen separately but not the union of those proposed values (confirmed from the experiment performed). Also the metrics and parameters affecting Cassandra performance are listed in this thesis.
Place, publisher, year, edition, pages
2016. , 54 p.
Docker, Cassandra, Cassandra compaction, NoSQL database
IdentifiersURN: urn:nbn:se:bth-13273OAI: oai:DiVA.org:bth-13273DiVA: diva2:1040758
Subject / course
DV2566 Master's Thesis (120 credits) in Computer Science
DVAXA Master of Science Programme in Computer Science
Boldt, Martin, Senior Lecturer