Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Customer segmentation of retail chain customers using cluster analysis
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
2019 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Kundsegmentering av detaljhandelskunder med klusteranalys (Swedish)
Abstract [en]

In this thesis, cluster analysis was applied to data comprising of customer spending habits at a retail chain in order to perform customer segmentation. The method used was a two-step cluster procedure in which the first step consisted of feature engineering, a square root transformation of the data in order to handle big spenders in the data set and finally principal component analysis in order to reduce the dimensionality of the data set. This was done to reduce the effects of high dimensionality. The second step consisted of applying clustering algorithms to the transformed data. The methods used were K-means clustering, Gaussian mixture models in the MCLUST family, t-distributed mixture models in the tEIGEN family and non-negative matrix factorization (NMF). For the NMF clustering a slightly different data pre-processing step was taken, specifically no PCA was performed. Clustering partitions were compared on the basis of the Silhouette index, Davies-Bouldin index and subject matter knowledge, which revealed that K-means clustering with K = 3 produces the most reasonable clusters. This algorithm was able to separate the customer into different segments depending on how many purchases they made overall and in these clusters some minor differences in spending habits are also evident. In other words there is some support for the claim that the customer segments have some variation in their spending habits.

Abstract [sv]

I denna uppsats har klusteranalys tillämpats på data bestående av kunders konsumtionsvanor hos en detaljhandelskedja för att utföra kundsegmentering. Metoden som använts bestod av en två-stegs klusterprocedur där det första steget bestod av att skapa variabler, tillämpa en kvadratrotstransformation av datan för att hantera kunder som spenderar långt mer än genomsnittet och slutligen principalkomponentanalys för att reducera datans dimension. Detta gjordes för att mildra effekterna av att använda en högdimensionell datamängd. Det andra steget bestod av att tillämpa klusteralgoritmer på den transformerade datan. Metoderna som användes var K-means klustring, gaussiska blandningsmodeller i MCLUST-familjen, t-fördelade blandningsmodeller från tEIGEN-familjen och icke-negativ matrisfaktorisering (NMF). För klustring med NMF användes förbehandling av datan, mer specifikt genomfördes ingen PCA. Klusterpartitioner jämfördes baserat på silhuettvärden, Davies-Bouldin-indexet och ämneskunskap, som avslöjade att K-means klustring med K=3 producerar de rimligaste resultaten. Denna algoritm lyckades separera kunderna i olika segment beroende på hur många köp de gjort överlag och i dessa segment finns vissa skillnader i konsumtionsvanor. Med andra ord finns visst stöd för påståendet att kundsegmenten har en del variation i sina konsumtionsvanor.

Place, publisher, year, edition, pages
2019.
Series
TRITA-SCI-GRU ; 2019:092
Keywords [en]
Cluster analysis, customer segmentation, tEIGEN, MCLUST, K-means, NMF, Silhouette, Davies-Bouldin, big spenders, statistics, applied mathematics, unsupervised learning
Keywords [sv]
Klusteranalys, kundsegmentering, tEIGEN, MCLUST, K-means, NMF, Silhouette, Davies-Bouldin, storkonsumenter, statistik, tillämpad matematik
National Category
Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:kth:diva-252559OAI: oai:DiVA.org:kth-252559DiVA, id: diva2:1319851
External cooperation
Advectas AB
Subject / course
Mathematical Statistics
Educational program
Master of Science - Applied and Computational Mathematics
Supervisors
Examiners
Available from: 2019-06-04 Created: 2019-06-03 Last updated: 2019-06-04Bibliographically approved

Open Access in DiVA

fulltext(4075 kB)105 downloads
File information
File name FULLTEXT02.pdfFile size 4075 kBChecksum SHA-512
93d4d1e4bcd3f0a2ff723f18b47c19674851d933879ff3bed9e344c0f8a5ca53a15cb9453a6500e9b3bfa3af92f61c1abac6d6d7c334256dccfc864dc6cf2c07
Type fulltextMimetype application/pdf

By organisation
Mathematical Statistics
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 105 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 355 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf