Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Loan Default Prediction using Supervised Machine Learning Algorithms
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
2019 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Fallissemangprediktion med hjälp av övervakade maskininlärningsalgoritmer (Swedish)
Abstract [en]

It is essential for a bank to estimate the credit risk it carries and the magnitude of exposure it has in case of non-performing customers. Estimation of this kind of risk has been done by statistical methods through decades and with respect to recent development in the field of machine learning, there has been an interest in investigating if machine learning techniques can perform better quantification of the risk. The aim of this thesis is to examine which method from a chosen set of machine learning techniques exhibits the best performance in default prediction with regards to chosen model evaluation parameters. The investigated techniques were Logistic Regression, Random Forest, Decision Tree, AdaBoost, XGBoost, Artificial Neural Network and Support Vector Machine. An oversampling technique called SMOTE was implemented in order to treat the imbalance between classes for the response variable. The results showed that XGBoost without implementation of SMOTE obtained the best result with respect to the chosen model evaluation metric.

Abstract [sv]

Det är nödvändigt för en bank att ha en bra uppskattning på hur stor risk den bär med avseende på kunders fallissemang. Olika statistiska metoder har använts för att estimera denna risk, men med den nuvarande utvecklingen inom maskininlärningsområdet har det väckt ett intesse att utforska om maskininlärningsmetoder kan förbättra kvaliteten på riskuppskattningen. Syftet med denna avhandling är att undersöka vilken metod av de implementerade maskininlärningsmetoderna presterar bäst för modellering av fallissemangprediktion med avseende på valda modelvaldieringsparametrar. De implementerade metoderna var Logistisk Regression, Random Forest, Decision Tree, AdaBoost, XGBoost, Artificiella neurala nätverk och Stödvektormaskin. En översamplingsteknik, SMOTE, användes för att behandla obalansen i klassfördelningen för svarsvariabeln. Resultatet blev följande: XGBoost utan implementering av SMOTE visade bäst resultat med avseende på den valda metriken.

Place, publisher, year, edition, pages
2019.
Series
TRITA-SCI-GRU ; 2019:073
Keywords [en]
Machine Learning, Deep Learning, Credit Risk, Default Prediction, Logistic Regression, Random Forest, Decision Tree, AdaBoost, XGBoost, Artificial Neural Network, Support Vector Machine, SMOTE
Keywords [sv]
Maskininlärning, Djupinlärning, Kreditrisk, Fallissemangprediktion, Logistisk Regression, Random Forest, Decision Tree, AdaBoost, XGBoost, Artificiella neurala nätverk, Stödvektormaskin, SMOTE
National Category
Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:kth:diva-252312OAI: oai:DiVA.org:kth-252312DiVA, id: diva2:1319711
External cooperation
Nordea
Subject / course
Mathematical Statistics
Educational program
Master of Science - Applied and Computational Mathematics
Supervisors
Examiners
Available from: 2019-06-04 Created: 2019-06-03 Last updated: 2019-06-04Bibliographically approved

Open Access in DiVA

fulltext(1921 kB)164 downloads
File information
File name FULLTEXT02.pdfFile size 1921 kBChecksum SHA-512
d8404468a00165edb21a39014791d9ca6b92f887bf0dd669f0a1e8b374d05e1305d6e517ab671b58f38d5a4b4302836ce427c9fb0990822c2ac2a1bfa824ad2e
Type fulltextMimetype application/pdf

By organisation
Mathematical Statistics
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 164 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 956 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf