Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Machine Learning for Software Bug Categorization
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
2019 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

The pursuit of flawless software is often an exhausting task for software developers. Code defects can range from soft issues to hard issues that lead to unforgiving consequences. DICE have their own system which automatically collects these defects which are grouped into buckets, however, this system suffers from the flaw of sometimes incorrectly grouping unrelated issues, and missing apparent duplicates. This time-consuming flaw puts excessive work for software developers and leads to wasted resources in the company. These flaws also impact the data quality of the system's defects tracking datasets which turn into a never-ending vicious circle. In this thesis, we investigate the method of measuring the similarity between reports in order to reduce incorrectly grouped issues and duplicate reports. Prototype models have been built for bug categorization and bucketing using convolutional neural networks. For each report, the prototype is able to provide developers with candidates of related issues with likelihood metric whether the issues are related. The similarity measurement is made in the representation phase of the neural networks, which we call the latent space. We also use Kullback–Leibler divergence in this space in order to get better similarity metrics. The results show important findings and insights for further improvement in the future. In addition to this, we discuss methods and strategies for detecting outliers using Mahalanobis distance in order to prevent incorrectly grouped reports.

Place, publisher, year, edition, pages
2019. , p. 63
Series
UPTEC IT, ISSN 1401-5749 ; 19018
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:uu:diva-395253OAI: oai:DiVA.org:uu-395253DiVA, id: diva2:1361472
Educational program
Master of Science Programme in Information Technology Engineering
Supervisors
Examiners
Available from: 2019-10-16 Created: 2019-10-16 Last updated: 2021-02-18Bibliographically approved

Open Access in DiVA

fulltext(7238 kB)791 downloads
File information
File name FULLTEXT01.pdfFile size 7238 kBChecksum SHA-512
75a7c52138c15bcdb77255fea319324ba4cb71dd112bf20f49ca52067c4f997ca73c6ebd3294acaa85f293bb84a081412155face39740e786bd3f0a07c5f8ab0
Type fulltextMimetype application/pdf

By organisation
Department of Information Technology
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 791 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 737 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf