Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Machine Learning in Banking: Exploring the feasibility of using consumer level bank transaction data for credit risk evaluation
KTH, School of Electrical Engineering and Computer Science (EECS).
2019 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

The financial industry is changing rapidly as a result of the increasing digitization of financial and economic resources and services. With a continuous increase in online payments and decrease in the usage of physical currency a new data source of fine-grained payment activities describing consumer behaviour has emerged. In the banking industry, this is an information source which has not yet been utilized to its full extent. The possibility of converting this data to meaningful information has the potential use in improving credit risk modelling and loan application screening.This work explore the feasibility of using transaction data for credit risk assessment by means of evaluating the correlation between financial behaviour derived from account statistics, and credit risk classification using the XGBoost machine learning algorithm. The XGBoost models were trained using a real world data set from a large Swedish bank consisting of 40 million raw transactions made by a random sample of 30000 individuals which have all been granted private consumer-level loans between 20000-350000 SEK without security.The results show that there exists a correlation between financial behaviour and credit risk classification. Payment frequency and general account balance statistics were identified as the primary drivers for risk classification decisions. Intra-monthly features resulted in the best performance for models trained on 5 risk classes as well as 2 risk classes (lowest and highest) reaching macroaverage ROC-AUC scores of 0.8 and 0.82, and macro-average f1-scores of0.39 and 0.79 respectively. Further investigation has been deemed necessary to determine if the correlation found implies causation..

Abstract [sv]

Finansindustrin förändras snabbt som ett resultat av den ökande digitaliseringen av finansiella och ekonomiska resurser och tjänster. Med en kontinuerlig ökning av onlinebetalningar och minskad användning av fysisk valuta har en ny datakälla av detaljerad betalningsaktivitet som beskriver konsumentbeteendet uppstått. I banksektorn är detta en informationskälla som ännu inte utnyttjats i sin fulla omfattning. Möjligheten att konvertera denna data till meningsfull information har potential att användas för att förbättra kreditriskmodellering och lånansökningsprocesser.Detta arbete undersöker möjligheten att använda transaktionsdata för kreditriskbedömning genom att utvärdera sambandet mellan ekonomiskt beteende härlett från kontostatistik och kreditriskklassificering med hjälp av maskininlärningsalgoritmen XGBoost. XGBoost-modellerna tränades med hjälp av ett dataset från en stor svensk bank bestående av 40 miljoner råtransaktioner gjorda av ett slumpmässigt urval av 30000 personer som alla har blivit beviljade privata konsumentlån mellan 20000-350000 SEK utan säkerhet.Resultaten visar att det finns en korrelation mellan finansiellt beteende och kreditriskklassificering. Betalningsfrekvens och allmän balansstatistik identifierades som de främsta drivkrafterna för beslut om riskklassificering. Egenskaper inom månaden gav de bästa resultaten för modeller tränade på 5 riskgrupper respektive 2 riskgrupper (lägsta och högsta) som uppnådde makroaverage ROC-AUC-score på 0.8 och 0.82 och macro-average f1-score på 0.39 respektive 0.79. Ytterligare fortsatt arbete har ansetts vara nödvändigt för att avgöra om den funna korrelationen implicerar ett orsakssamband.

Place, publisher, year, edition, pages
2019. , p. 82
Series
TRITA-EECS-EX ; 2019:480
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-255017OAI: oai:DiVA.org:kth-255017DiVA, id: diva2:1337265
Examiners
Available from: 2019-07-12 Created: 2019-07-12 Last updated: 2019-07-12Bibliographically approved

Open Access in DiVA

fulltext(6420 kB)49 downloads
File information
File name FULLTEXT01.pdfFile size 6420 kBChecksum SHA-512
fc262d436fa75018617e76a9e397ac68449c49ae45a239cfe9bdc83095a8f6f8540e2e9fddce27cd5eb602bd8b3a035a06539f35ccca99e55b3cb3168db67110
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 49 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 77 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf