Change search
ReferencesLink to record
Permanent link

Direct link
Feature Selection and Case Selection Methods Based on Mutual Information in Software Cost Estimation
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
2014 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Software cost estimation is one of the most crucial processes in software development management because it involves many management activities such as project planning, resource  allocation and risk assessment. Accurate software cost estimation not only does help to make investment and bid plan but also enable the project to be completed in the limited cost and time. The research interest of this master thesis will focus on feature selection method and case selection method and the goal is to improve the accuracy of software cost estimation model.

Case based reasoning in software cost estimation is an immediate area of research focus. It can predict the cost of new software project via constructing estimation model using historical software projects. In order to construct estimation model, case based reasoning in software cost estimation needs to pick out relatively independent candidate features which are relevant to the estimated feature. However, many sequential search feature selection methods used currently are not able to obtain the redundancy value of candidate features precisely. Besides, when using local distance of candidate features to calculate the global distance of two software projects in case selection, the different impact of each candidate feature is unproven.

To solve these two problems, this thesis explores the solutions with the help from NSFC. In this thesis, a feature selection algorithm based on hierarchical clustering is proposed. It gathers similar candidate features into the same clustering and selects one feature that is most similar to the estimated feature as the representative feature. These representative features form the candidate feature subsets. Evaluation metrics are applied to these candidate feature subsets and the one that can produce best performance will be marked as the final result of feature selection. The experiment result shows that the proposed algorithm improves 12.6% and 3.75% in PRED (0.25) over other sequential search feature selection methods on ISBSG dataset and Desharnais dataset, respectively. Meanwhile, this thesis defines candidate feature weight using symmetric uncertainty which origins from information theory. The feature weight is capable of reflecting the impact of each feature with the estimated feature. The experiment result demonstrates  that by applying feature weight, the performance of estimation model improves 8.9% than that without feature weight in PRED (0.25) value.

This thesis discusses and analyzes the drawback of proposed ideas as well as mentions some improvement directions.

Place, publisher, year, edition, pages
IT, 14 048
National Category
Engineering and Technology
URN: urn:nbn:se:uu:diva-231607OAI: diva2:745091
Educational program
Master Programme in Computer Science
Available from: 2014-09-09 Created: 2014-09-09 Last updated: 2014-09-09Bibliographically approved

Open Access in DiVA

fulltext(1606 kB)253 downloads
File information
File name FULLTEXT01.pdfFile size 1606 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
Department of Information Technology
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 253 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 2924 hits
ReferencesLink to record
Permanent link

Direct link