Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Interactive Topic Modeling for Source Code Analysis
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
2017 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Trying to make sense of large sets of data is becoming a task very central to computer science in general. Topic models, capable of uncovering the semantic themes pervading through large collections of documents, have seen a surge in popularity in recent years, with applications in a variety of domains. In this thesis, topic models are applied to source code repositories, specifically for the purpose of concept location - offering an overview of which features are contained within asystem, the relationships between such features, and their locality within the system. Topic models are high level statistical tools; their raw output is given in terms of probability distributions, suited neither for simple interpretation nor deep analysis.Interpreting an inferred model in an intuitive manner requires significant post process ingand tools suited for such purposes. Additionally, topic models rarely produce perfectly sensible and coherent topics without some level of supervision - some measure of human interaction is thus typically required for refining the output. Our objective is to simplify the process of topic modeling as it pertains to source code analysis, by addressing the afore mentioned issues. First, by implementing existing methods of semi-supervised topic modeling, offering users tools for iteratively refining an inferred model. Second, by tightly integrating topic modeling with high level visual representations of inferred models, capable of capturing the relationship between terms, documents and features related to a source code repository. We have implemented a fully working prototype of such a system. Through a survey, we have put the tool in the hands of users, thereby demonstrating the system to offer several perceived benefits from a user perspective - in terms of easily comprehending large-scale repositories and in terms of facilitating the process of topic modeling.

Place, publisher, year, edition, pages
2017. , p. 68
Series
IT ; 17062
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:uu:diva-334376OAI: oai:DiVA.org:uu-334376DiVA, id: diva2:1159462
Educational program
Master Programme in Computer Science
Supervisors
Examiners
Available from: 2017-11-23 Created: 2017-11-22 Last updated: 2017-11-23Bibliographically approved

Open Access in DiVA

fulltext(1891 kB)34 downloads
File information
File name FULLTEXT01.pdfFile size 1891 kBChecksum SHA-512
ef955a1c822f27f4f8872e342b65b3f3ca4da069a12e7834df795f326e6082069f27ce5bacb39a0368121c3b1411bcccdaa166e3e159846cbb3cc384db657d3b
Type fulltextMimetype application/pdf

By organisation
Department of Information Technology
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 34 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 275 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf