Change search

Cite
Citation style
• apa
• ieee
• modern-language-association-8th-edition
• vancouver
• Other style
More styles
Language
• de-DE
• en-GB
• en-US
• fi-FI
• nn-NO
• nn-NB
• sv-SE
• Other locale
More languages
Output format
• html
• text
• asciidoc
• rtf
Performance of Three Classification Techniques in Classifying Credit Applications Into Good Loans and Bad Loans: A Comparison
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Social Sciences, Department of Statistics.
2015 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
##### Abstract [en]

The use of statistical classification techniques in classifying loan applications into good loans and bad loans gained importance with the exponential increase in the demand for credit. It is paramount to use a classification technique with a high predictive capacity to ensure the profitability of the business venture.

In this study we aim to compare the predictive capability of three classification techniques: 1) Logistic regression, 2) CART, and 3) random forests. We apply these techniques on German credit data using an 80:20 learning:test split, and compare the performance of the models fitted using the three classification techniques. The probability of default pi for each observation in the test set is calculated using the models fitted on the training dataset. Each test set sample xi is then classified into a good loan or a bad loan, based on a threshold , such that xi$\in$ bad loan class if pi  $\alpha$. We chose several $\alpha$ thresholds in order to compare the performance of each of the three classification techniques on five model suitability statistics: Accuracy, precision, negative predictive value, recall, and specificity.

None of the classifiers turned out to be best at all the five cross-validation statistics. However, logistic regression has the best performance at low probability of default thresholds. On the other hand, for higher thresholds, CART performs best in accuracy, precision, and specificity measures, while random forest performs best for negative predictive value and recall measures.

2015.
##### National Category
Probability Theory and Statistics
##### Identifiers
OAI: oai:DiVA.org:uu-256089DiVA, id: diva2:824593
Statistics
##### Educational program
Master Programme in Statistics
##### Examiners
Available from: 2015-06-24 Created: 2015-06-22 Last updated: 2015-06-24Bibliographically approved

#### Open Access in DiVA

##### File information
File name FULLTEXT01.pdfFile size 5434 kBChecksum SHA-512
Type fulltextMimetype application/pdf
##### By organisation
Department of Statistics
##### On the subject
Probability Theory and Statistics

#### Search outside of DiVA

The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available
urn-nbn

#### Altmetric score

urn-nbn
Total: 1236 hits

Cite
Citation style
• apa
• ieee
• modern-language-association-8th-edition
• vancouver
• Other style
More styles
Language
• de-DE
• en-GB
• en-US
• fi-FI
• nn-NO
• nn-NB
• sv-SE
• Other locale
More languages
Output format
• html
• text
• asciidoc
• rtf
v. 2.35.4
|