Data Analysis of Minimally-Structured Heterogeneous Logs: An experimental study of log template extraction and anomaly detection based on Recurrent Neural Network and Naive Bayes.
Independent thesis Advanced level (degree of Master (Two Years)), 80 credits / 120 HE creditsStudent thesis
Nowadays, the ideas of continuous integration and continuous delivery are under heavy usage in order to achieve rapid software development speed and quick product delivery to the customers with good quality. During the process ofmodern software development, the testing stage has always been with great significance so that the delivered software is meeting all the requirements and with high quality, maintainability, sustainability, scalability, etc. The key assignment of software testing is to find bugs from every test and solve them.
The developers and test engineers at Ericsson, who are working on a large scale software architecture, are mainly relying on the logs generated during the testing, which contains important information regarding the system behavior and software status, to debug the software. However, the volume of the data is too big and the variety is too complex and unpredictable, therefore, it is very time consuming and with great efforts for them to manually locate and resolve the bugs from such vast amount of log data.
The objective of this thesis project is to explore a way to conduct log analysis efficiently and effectively by applying relevant machine learning algorithms in order to help people quickly detect the test failure and its possible causalities. In this project, a method of preprocessing and clusering original logs is designed and implemented in order to obtain useful data which can be fed to machine learning algorithms. The comparable log analysis, based on two machine learning algorithms - Recurrent Neural Network and Naive Bayes, is conducted for detecting the place of system failures and anomalies. Finally, relevant experimental results are provided and analyzed.
Place, publisher, year, edition, pages
2016. , 88 p.
Data analysis, Log analysis, RNN, Naive Bayes
Engineering and Technology
IdentifiersURN: urn:nbn:se:kth:diva-191334OAI: oai:DiVA.org:kth-191334DiVA: diva2:956109
Master of Science - Embedded Systems
2016-08-26, 12:24 (English)