Change search
ReferencesLink to record
Permanent link

Direct link
Automatic Source Code Classification: Classifying Source Code for a Case-Based Reasoning System
Mid Sweden University, Faculty of Science, Technology and Media, Department of Information and Communication systems.
2015 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

This work has investigated the possibility of classifying Java source code into cases for a case-based reasoning system. A Case-Based Reasoning system is a problem solving method in Artificial Intelligence that uses knowledge of previously solved problems to solve new problems. A case in case-based reasoning consists of two parts: the problem part and solution part. The problem part describes a problem that needs to be solved and the solution part describes how this problem was solved. In this work, the problem is described as a Java source file using words that describes the content in the source file and the solution is a classification of the source file along with the source code. To classify Java source code, a classification system was developed. It consists of four analyzers: type filter, documentation analyzer, syntactic analyzer and semantic analyzer. The type filter determines if a Java source file contains a class or interface. The documentation analyzer determines the level of documentation in asource file to see the usefulness of a file. The syntactic analyzer extracts statistics from the source code to be used for similarity, and the semantic analyzer extracts semantics from the source code. The finished classification system is formed as a kd-tree, where the leaf nodes contains the classified source files i.e. the cases. Furthermore, a vocabulary was developed to contain the domain knowledge about the Java language. The resulting kd-tree was found to be imbalanced when tested, as the majority of source files analyzed were placed inthe left-most leaf nodes. The conclusion from this was that using documentation as a part of the classification made the tree imbalanced and thus another way has to be found. This is due to the fact that source code is not documented to such an extent that it would be useful for this purpose.

Place, publisher, year, edition, pages
2015. , 70 p.
Keyword [en]
Artificial Intelligence, Case-Based Reasoning, CBR, Vocabulary, Classification, Similarity measure, Distance measure, Java, C++
National Category
Computer Systems
URN: urn:nbn:se:miun:diva-25519OAI: diva2:841529
Subject / course
Computer Engineering DT1
Educational program
Master of Science in Engineering - Computer Engineering TDTEA 300 higher education credits
Available from: 2015-07-14 Created: 2015-07-13 Last updated: 2015-08-18Bibliographically approved

Open Access in DiVA

fulltext(753 kB)40 downloads
File information
File name FULLTEXT01.pdfFile size 753 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Nordström, Markus
By organisation
Department of Information and Communication systems
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 40 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 85 hits
ReferencesLink to record
Permanent link

Direct link