Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Data Fusion for Consumer Behaviour
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
2017 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

This thesis analyses different methods of data fusion by fitting a chosen number of statistical models to empirical consumer data and evaluating their performance in terms of a selection of performance measures. The main purpose of the models is to predict business related consumer variables. Conventional methods such as decision trees, linear model and K-nearest neighbor have been suggested as well as single-layered neural networks and the naive Bayesian classifier. Furthermore, ensemble methods for both classification and regression have been investigated by minimizing the cross-entropy and RMSE of predicted outcomes using the iterative non-linear BFGS optimization algorithm. Time consumption of the models and methods for feature selection are also discussed in this thesis. Data regarding consumer drinking habits, transaction and purchase history and social demographic background is provided by Nepa. Evaluation of the performance measures indicate that the naive Bayesian classifier predicts consumer drinking habits most accurately whereas the random forest, although the most time consuming, is preferred when classifying the Consumer Satisfaction Index (CSI). Regression of CSI yield similar performance to all models. Moreover, the ensemble methods increased the prediction accuracy slightly in addition to increasing the time consumption. 

Abstract [sv]

I den här uppsatsen undersöks olika metoder för data fusion genom att anpassa ett antal statistiska modeller till empirisk konsument data och evaluera modellernas prestationsnivå med avseende på ett antal statistiska mått. Syftet för modellerna är att prediktera affärsrelaterade konsumentvariabler. I denna rapport har konventionella metoder såsom beslutsträd, linjära modeller och metoden med de närmsta grannarna föreslagits samt enkelskiktade neurala nätverk och den naiva bayesianska klassificeraren. Vidare har även ensemble metoder för både klassificeringar och regressioner undersökts genom att minimera korsentropin och RMSE av predikterade utfall med den iterativa icke-linjära optimeringsalgoritmen BFGS. Tidskonsumtion för modellerna och metoder för selektion av prediktorer har också diskuterats i rapporten. Data gällande konsumenternas alkoholvanor, transaktion- och köphistorik samt social demografiska bakgrund har försetts av Nepa. Evaluering av prestationsmåtten visar att den naiva bayesianska klassificeraren ger de mest precisa prediktionerna av konsumenternas driksvanor medan random forest, fastän den mest tidskrävande, är föredragen vid klassifiering av Nöjd Kund Index (NKI). Regression av NKI resulterade i likartad prestations nivå för samtliga modeller. Ensemble-metoderna gav en lätt ökning av prediceringsprecision samt en ökad tidskonsumtion.

Place, publisher, year, edition, pages
2017.
Series
TRITA-MAT-E ; 2017:35
National Category
Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:kth:diva-209247OAI: oai:DiVA.org:kth-209247DiVA, id: diva2:1111499
External cooperation
NEPA
Subject / course
Mathematical Statistics
Educational program
Master of Science - Applied and Computational Mathematics
Supervisors
Examiners
Available from: 2017-06-19 Created: 2017-06-19 Last updated: 2017-06-19Bibliographically approved

Open Access in DiVA

fulltext(1251 kB)83 downloads
File information
File name FULLTEXT01.pdfFile size 1251 kBChecksum SHA-512
01aa5bca02a7ec5451cd664e011e2afd3b59906169b5d806849880f02319a9a301497181f500b27c0d6f1bade2fed5e22e4575c768072a47679106e8ced746ea
Type fulltextMimetype application/pdf

By organisation
Mathematical Statistics
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 83 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 341 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf