Performance of Three Classification Techniques in Classifying Credit Applications Into Good Loans and Bad Loans: A Comparison
Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
The use of statistical classification techniques in classifying loan applications into good loans and bad loans gained importance with the exponential increase in the demand for credit. It is paramount to use a classification technique with a high predictive capacity to ensure the profitability of the business venture.
In this study we aim to compare the predictive capability of three classification techniques: 1) Logistic regression, 2) CART, and 3) random forests. We apply these techniques on German credit data using an 80:20 learning:test split, and compare the performance of the models fitted using the three classification techniques. The probability of default pi for each observation in the test set is calculated using the models fitted on the training dataset. Each test set sample xi is then classified into a good loan or a bad loan, based on a threshold , such that xi bad loan class if pi > . We chose several thresholds in order to compare the performance of each of the three classification techniques on five model suitability statistics: Accuracy, precision, negative predictive value, recall, and specificity.
None of the classifiers turned out to be best at all the five cross-validation statistics. However, logistic regression has the best performance at low probability of default thresholds. On the other hand, for higher thresholds, CART performs best in accuracy, precision, and specificity measures, while random forest performs best for negative predictive value and recall measures.
Place, publisher, year, edition, pages
Probability Theory and Statistics
IdentifiersURN: urn:nbn:se:uu:diva-256089OAI: oai:DiVA.org:uu-256089DiVA: diva2:824593
Subject / course
Master Programme in Statistics
Andersson, Patrik, Senior Lecturer
Ahmad, Rauf, Senior Lecturer