Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Prediction of Code Lifetime
Linköping University, Department of Computer and Information Science, Statistics. Linköping University, Faculty of Science & Engineering.
2017 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

There are several previous studies in which machine learning algorithms are used to predict how fault-prone a piece of code is. This thesis takes on a slightly different approach by attempting to predict how long a piece of code will remain unmodified after being written (its “lifetime”). This is based on the hypothesis that frequently modified code is more likely to contain weaknesses, which may make lifetime predictions useful for code evaluation purposes. In this thesis, the predictions are made with machine learning algorithms which are trained on open source code examples from GitHub. Two different machine learning algorithms are used: the multilayer perceptron and the support vector machine. A piece of code is described by three groups of features: code contents, code properties obtained from static code analysis, and metadata from the version control system Git. In a series of experiments it is shown that the support vector machine is the best performing algorithm and that all three feature groups are useful for predicting lifetime. Both the multilayer perceptron and the support vector machine outperform a baseline prediction which always outputs the mean lifetime of the training set. This indicates that lifetime to some extent can be predicted based on information extracted from the code. However, lifetime prediction performance is shown to be highly dataset dependent with large error magnitudes.

Place, publisher, year, edition, pages
2017. , p. 53
Keyword [en]
machine learning, support vector machines, neural networks, code analysis, version control, open-source
National Category
Other Computer and Information Science
Identifiers
URN: urn:nbn:se:liu:diva-135060ISRN: LIU-IDA/LITH-EX-A--17/004--SEOAI: oai:DiVA.org:liu-135060DiVA, id: diva2:1079225
External cooperation
Combitech AB
Subject / course
Computer Engineering
Presentation
2017-02-15, Donald Knuth, Linköping University building B, Linköping, 10:15 (Swedish)
Supervisors
Examiners
Available from: 2017-03-14 Created: 2017-03-07 Last updated: 2018-01-13Bibliographically approved

Open Access in DiVA

fulltext(2252 kB)40 downloads
File information
File name FULLTEXT01.pdfFile size 2252 kBChecksum SHA-512
d57879ba826f253b9d53f0845223dfac8500e288ddafb4fbeba5e7bdc1b15e61694d837a9d3229f3df5e075b94a64b59bd202634ef3a9b5fe31fd2f4b49b5a16
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Nordfors, Per
By organisation
StatisticsFaculty of Science & Engineering
Other Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 40 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 156 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
v. 2.34-SNAPSHOT
|