Preventing data loss using rollback-recovery: A proof-of-concept study at Bolagsverket
Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
This thesis investigates two alternative approaches, referred to as automatic- and semi-automatic replay, which can be used to prevent data loss due to a certain set of unforeseen events at Bolagsverket, the Swedish Companies Registration Office. The approaches make it possible to recover the correct data from a database that belongs to a stateless distributed system and that contains erroneous- or inaccurate information due to past faults. Both approaches utilize log-based rollback-recovery techniques but make different assumptions regarding the deterministic behaviour of Bolagsverket’s systems. A stateless distributed system logs all received messages during failure-free operation. During recovery, automatic replay recovers the data by enabling the system to re-process the logged messages. In contrast, semi-automatic replay recovers data by utilizing the logged messages to enable officials at Bolagsverket to manually redo lost work in a controlled manner. Proof-of-concept implementations of the two replay approaches are developed on a simplified model that resembles one of Bolagsverket’s electronic services, yet that is general to any stateless system that communicates asynchronously using JMS messages and synchronously using XML sent over HTTP. The theoretical- and performance evaluation was conducted with the aim of producing results general to any system with similar characteristics to those of the model. The results suggest that the failure-free overhead at Bolagsverket is approximately 100 milliseconds per logged message, and that around 3 gigabytes of data must be stored in order to recover one average day’s operation. Further, automatic replay successfully manages to recover one average day’s operation in around 70 minutes. Semi-automatic replay is calculated to require, at a maximum, one workday to recover the same amount of data. It is assessed that automatic replay is a suitable solution for Bolagsverket if it is proven that their systems are fully deterministic. In other cases, it is assessed that semi-automatic replay can be utilized. It is however recommended that further evaluations are conducted before the approaches are implemented in a production environment.
Place, publisher, year, edition, pages
2013. , 85 p.
Fault tolerance, Rollback-recovery, Data loss, Database, Bolagsverket
Computer and Information Science Software Engineering Computer Engineering
IdentifiersURN: urn:nbn:se:miun:diva-20901OAI: oai:DiVA.org:miun-20901DiVA: diva2:682324
Subject / course
Computer Engineering DT1
Master of Science in Engineering - Computer Engineering TDTEA 300 higher education credits
Shen, Wei, Ph.D
Zhang, Tingting, Professor