Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Fraud detection in online payments using Spark ML
KTH, School of Information and Communication Technology (ICT).
2017 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Frauds in online payments cause billions of dollars in losses every year. To reduce them, traditional fraud detection systems can be enhanced with the latest advances in machine learning, which usually require distributed computing frameworks to handle the big size of the available data.

Previous academic work has failed to address fraud detection in real-world environments. To fill this gap, this thesis focuses on building a fraud detection classifier on Spark ML using real-world payment data.

Class imbalance and non-stationarity reduced the performance of our models, so experiments to tackle those problems were performed. Our best results were achieved by applying undersampling and oversampling on the training data to reduce the class imbalance. Updating the model regularly to use the latest data also helped diminishing the negative effects of non-stationarity.

A final machine learning model that leverages all our findings has been deployed at Qliro, an important online payments provider in the Nordics. This model periodically sends suspicious purchase orders for review to fraud investigators, enabling them to catch frauds that were missed before.

Abstract [sv]

Bedrägerier vid online-betalningar medför stora förluster, så företag bygger bedrägeribekämpningssystem för att förhindra dem.

I denna avhandling studerar vi hur maskininlärning kan tillämpas för att förbättra dessa system.

Tidigare studier har misslyckats med att hantera bedrägeribekämpning med verklig data, ett problem som kräver distribuerade beräkningsramverk för att hantera den stora datamängden.

För att lösa det har vi använt betalningsdata från industrin för att bygga en klassificator för bedrägeridetektering via Spark ML. Obalanserade klasser och icke-stationäritet minskade träffsäkerheten hos våra modeller, så experiment för att hantera dessa problem har utförts.

Våra bästa resultat erhålls genom att kombinera undersampling och oversampling på träningsdata. Att använda bara den senaste datan och kombinera flera modeller som ej har tränats med samma data förbättrar också träffsäkerheten.

En slutgiltig modell har implementerats hos Qliro, en stor leverantör av online betalningar i Norden, vilket har förbättrat deras bedrägeribekämpningssystem och hjälper utredare att upptäcka bedrägerier som tidigare missades.

Place, publisher, year, edition, pages
2017. , p. 119
Series
TRITA-ICT-EX ; 2017:153
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-219916OAI: oai:DiVA.org:kth-219916DiVA, id: diva2:1165925
Subject / course
Computer Science
Educational program
Master of Science - Computer Science
Supervisors
Examiners
Available from: 2017-12-14 Created: 2017-12-14 Last updated: 2018-01-13Bibliographically approved

Open Access in DiVA

fulltext(1101 kB)538 downloads
File information
File name FULLTEXT01.pdfFile size 1101 kBChecksum SHA-512
3e6a32e36b5092f154bce75cc8e6e322f3b4cdc43eeecae6243307a462ffb19c156049954e3c4b08abc2dea03f7aca550faac2d7adfefdbca0d65b67f5bd7d58
Type fulltextMimetype application/pdf

By organisation
School of Information and Communication Technology (ICT)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 538 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 1034 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf