Using Data Mining for Static Code Analysis of C
2012 (English)Conference paper (Refereed) Published
Static analysis of source code is one way to find bugs and problems in large software projects. Many approaches to static analysis have been proposed. We proposed a novel way of performing static analysis. Instead of methods based on semantic/logic analysis we apply machine learning directly to the problem. This has many benefits. Learning by example means trivial programmer adaptability (a problem with many other approaches), learning systems also has the advantage to be able to generalise and find problematic source code constructs that are not exactly as the programmer initially thought, to name a few. Due to the general interest in code quality and the availability of large open source code bases as test and development data, we believe this problem should be of interest to the larger data mining community. In this work we extend our previous approach and investigate a new way of doing feature selection and test the suitability of many different learning algorithms. This on a selection of problems we adapted from large publicly available open source projects. Many algorithms were much more successful than our previous proof-of-concept, and deliver practical levels of performance. This is clearly an interesting and minable problem.
Place, publisher, year, edition, pages
Nanjing, China: Springer , 2012.
software engineering, static analysis, application
IdentifiersURN: urn:nbn:se:bth-7112Local ID: oai:bth.se:forskinfo472F0453B8DC948CC1257ACB003733B7OAI: oai:DiVA.org:bth-7112DiVA: diva2:834693
8th International Conference on Advanced Data Mining and Applications (ADMA 2012)