Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Automatic Bug Report Assignment Using Multilevel Recurrent Neural Networks
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
2018 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

In any software system or project, a continuous inflow of bug reports is an integral part of its upkeep and development. These bug reports, which could amount to a great number in a large system, are typically handled by several layers of human experts assigning the reports encountered to the corresponding developers. With the advancement in machine learning techniques on document classification, this task could be done automatically with high enough accuracy that the amount of human expert time required would be vastly reduced.In this thesis, we study automatic bug report assignment in the context of the telecom industry. In particular, we study the current state-of-the-art document representation and classification methods applied to bug reports with an emphasis on the usage of word embeddings and multilevel recurrent neural network (RNN). The model we emphasize is a two-level RNN model that incorporates document structure in its design, with the first level consisting of words sequence, representing a sentence and the second level consisting of a sequence of previously mentioned sentence representations, constructing the document representation.A bug report document differs from a general text document in a sense that it often contains boilerplate, software source code, error codes or machine-generated output that could only be understood by the system developers or maintainers and does not conform to common English document rules. This unique nature of the vocabulary with many unrelated symbols could deteriorate the accuracy of the classifiers. Therefore, in addition to document classification, we develop a boilerplate removal system based on stacked generalization ensemble classifier with shallow text features to separate templates, human-generated text and machine-generated text.We conducted our automatic bug report assignment on a sub-collection of eight years of bug reports from our industrial partner. Our experiments show that: (1) The multilevel RNN model performs better than the standard RNN model. (2) Bug report assignment is currently best handled by the stacked generalization ensemble method. (3) Using the Boilerplate removal system to extract only the human-generated text from the bug report documents, various classifiers perform relatively well with only 1/10th of the data in comparison to handcrafted preprocessing rules.

Place, publisher, year, edition, pages
2018. , p. 63
Keywords [en]
Text Classification, Machine Learning, Bug Reports, Bug Assignments, Document Representation, Word Embeddings, Artificial Neural Networks, Recurrent Neural Networks
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:uu:diva-347324OAI: oai:DiVA.org:uu-347324DiVA, id: diva2:1194050
External cooperation
Ericsson AB
Subject / course
Language Technology
Educational program
Master Programme in Language Technology
Supervisors
Examiners
Available from: 2018-03-31 Created: 2018-03-28 Last updated: 2018-03-31Bibliographically approved

Open Access in DiVA

No full text in DiVA

By organisation
Department of Linguistics and Philology
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 670 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf