Change search
ReferencesLink to record
Permanent link

Direct link
Mining Git Repositories: An introduction to repository mining
Linnaeus University, Faculty of Technology, Department of Computer Science.
2013 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

When performing an analysis of the evolution of software quality and software metrics,there is a need to get access to as many versions of the source code as possible. There isa lack of research on how data or source code can be extracted from the source controlmanagement system Git. This thesis explores different possibilities to resolve thisproblem.

Lately, there has been a boom in usage of the version control system Git. Githubalone hosts about 6,100,000 projects. Some well known projects and organizations thatuse Git are Linux, WordPress, and Facebook. Even with these figures and clients, thereare very few tools able to perform data extraction from Git repositories. A pre-studyshowed that there is a lack of standardization on how to share mining results, and themethods used to obtain them.

There are several tools available for older version control systems, such as concurrentversions system (CVS), but few for Git. The examined repository mining applicationsfor Git are either poorly documented; or were built to be very purpose-specific to theproject for which they were designed.

This thesis compiles a list of general issues encountered when using repositorymining as a tool for data gathering. A selection of existing repository mining tools wereevaluated towards a set of prerequisite criteria. The end result of this evaluation is thecreation of a new repository mining tool called Doris. This tool also includes a smallcode metrics analysis library to show how it can be extended.

Place, publisher, year, edition, pages
2013. , 28 p.
Keyword [en]
repository mining, msr, git, quality analysis, version control system, vcs, source control management, scm, data mining, data extraction
National Category
Computer Science
URN: urn:nbn:se:lnu:diva-27742OAI: diva2:638844
Subject / course
Computer Science
2013-06-03, 18:35 (English)
Available from: 2013-08-12 Created: 2013-08-02 Last updated: 2013-08-12Bibliographically approved

Open Access in DiVA

Mining Git Repositories(859 kB)1944 downloads
File information
File name FULLTEXT01.pdfFile size 859 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Carlsson, Emil
By organisation
Department of Computer Science
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 1944 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 638 hits
ReferencesLink to record
Permanent link

Direct link