Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Big data is a developing term that describes any large amount of structured and unstructured
data that has the potential to be mined for information. To store this type of large amounts of
data, cloud storage systems are necessary. These cloud storage systems are developed such
that they are capable of keeping the data accessible and available to the users over a network.
To store big data new platforms are required. Some of the popular big data platforms are
Mongo, Cassandra and Hadoop. In this thesis we used Cassandra database system because it
is a distributed database and also open source. Cassandra’s architecture is master less ring
design that is easy to setup and easy to maintain. Apache Cassandra is a highly scalable
distributed database designed to handle big data management with linear scalable and seamless
multiple data center deployment. It is a NoSQL database system which allow schema free
tables so that a data item could have a variable set of columns unlike in relational databases.
Cassandra provides with high scalability with no single point of failure.
For the past few years’ container based virtualization has been evolving rapidly. Container
based virtualization such as LXC have been focused here. Linux Containers (LXC) is an
operating system level virtualization method for running multiple isolated Linux systems on a
single control host. It does not resemble a virtual machine, but provides a virtual environment
that has its own CPU, memory, network, etc. space and the resource control mechanism. In
this thesis work performance of Apache Cassandra database has been analyzed between bare
metal and Linux Containers(LXC).
A three node Cassandra cluster has been created on both bare metal and Linux container.
Assuming one node as seed and Cassandra stress utility tool has been used to test the load of
Cassandra cluster. The performance of Cassandra cluster database has been evaluated in bare
metal and Linux Container which is the goal of this thesis work.
Linux containers (LXC) are deployed in all the servers. A three node Cassandra database
cluster has been created in these servers and also in Linux Container(LXC). Port forwarding
is the technique used here for making communication between Cassandra in LXC which is the
goal of this thesis work. The performance metrics which determine the performance of
Cassandra cluster database are selected according to it. The network configuration parameters
are changed according to the behavior of Cassandra. By doing changes in these parameters
Cassandra starts running according to the required configuration, after this Cassandra cluster
performance will be analyzed. This is done with different write, read and mixed load
operations and compared with Cassandra cluster performance on bare metal.
The results of the thesis show an analysis of measurements of performance metrics like CPU
utilization, Disk throughput and latency while running on Cassandra cluster in both bare metal
and Linux Containers. A quantitative and statistical analysis of performance of Cassandra
cluster is compared.
The physical resources utilized by the Cassandra database on native bare metal and Linux
Containers (LXC) is similar. According to the results, CPU utilization is more for Cassandra
database in Linux Containers. Disk throughput is also more in Linux Containers except in the
case of 66% load write operation. Bare metal has less latency compared to Linux Containers
in all the scenarios.
2016. , 47 p.
ET2580 Master's Thesis (120 credits) in Electrical Engineering with emphasis on Telecommunication Systems
ETATX Master of Science Programme in Electrical Engineering with emphasis on Telecommunication Systems
2016-09-27, J3208 Claude Shannon, Blekinge Institute Of Technology, Karlskrona, 11:45 (English)