Change search
ReferencesLink to record
Permanent link

Direct link
A Study of NoSQL and NewSQL databases for data aggregation on Big Data
KTH, School of Information and Communication Technology (ICT).
2013 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Sensor data analysis at Scania deal with large amount of data collected from vehicles. Each time a Scania vehicle enters a workshop, a large number of variables are collected and stored in a RDBMS at high speed. Sensor data is numeric and is stored in a Data Warehouse. Ad-hoc analyses are performed on this data using Business Intelligence (BI) tools like SAS. There are challenges in using traditional database that are studied to identify improvement areas. Sensor data is huge and is growing at a rapid pace. It can be categorized as BigData for Scania. This problem is studied to define ideal properties for a high performance and scalable database solution. Distributed database products are studied to find interesting products for the problem. A desirable solution is a distributed computational cluster, where most of the computations are done locally in storage nodes to fully utilize local machine’s memory, and CPU and minimize network load. There is a plethora of distributed database products categorized under NoSQL and NewSQL. There is a large variety of NoSQL products that manage Organizations data in a distributed fashion. NoSQL products typically have advantage as improved scalability and disadvantages like lacking BI tool support and weaker consistency. There is an emerging category of distributed databases known as NewSQL databases that are relational data stores and they are designed to meet the demand for high performance and scalability. In this paper, an exploratory study was performed to find suitable products among these two categories. One product from each category was selected based on comparative study for practical implementation and the production data was imported to the solutions. Performance for a common use case (median computation) was measured and compared. Based on these comparisons, recommendations were provided for a suitable distributed product for Sensor data analysis.

Place, publisher, year, edition, pages
2013. , 56 p.
TRITA-ICT-EX, 2013:256
National Category
Computer and Information Science
URN: urn:nbn:se:kth:diva-143345OAI: diva2:706302
Subject / course
Information and Software Systems
Educational program
Master of Science - Software Engineering of Distributed Systems
Available from: 2014-03-19 Created: 2014-03-19 Last updated: 2014-03-19Bibliographically approved

Open Access in DiVA

fulltext(1418 kB)2818 downloads
File information
File name FULLTEXT01.pdfFile size 1418 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
School of Information and Communication Technology (ICT)
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 2818 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 1582 hits
ReferencesLink to record
Permanent link

Direct link