Change search
ReferencesLink to record
Permanent link

Direct link
Supervised Learning Techniques: A comparison of the Random Forest and the Support Vector Machine
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Social Sciences, Department of Statistics.
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Social Sciences, Department of Statistics.
2016 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

This thesis examines the performance of the support vector machine and the random forest models in the context of binary classification. The two techniques are compared and the outstanding one is used to construct a final parsimonious model. The data set consists of 33 observations and 89 biomarkers as features with no known dependent variable. The dependent variable is generated through k-means clustering, with a predefined final solution of two clusters. The training of the algorithms is performed using five-fold cross-validation repeated twenty times. The outcome of the training process reveals that the best performing versions of the models are a linear support vector machine and a random forest with six randomly selected features at each split. The final results of the comparison on the test set of these optimally tuned algorithms show that the random forest outperforms the linear kernel support vector machine. The former classifies all observations in the test set correctly whilst the latter classifies all but one correctly. Hence, a parsimonious random forest model using the top five features is constructed, which, to conclude, performs equally well on the test set compared to the original random forest model using all features.

Place, publisher, year, edition, pages
2016. , 57 p.
Keyword [en]
machine learning, biomarkers, cross-validation, receiver operating characteristic, k-means clustering, feature selection, binary classification
National Category
Probability Theory and Statistics
URN: urn:nbn:se:uu:diva-274768OAI: diva2:897594
External cooperation
Pharma Consulting Group
Subject / course
Available from: 2016-02-10 Created: 2016-01-26 Last updated: 2016-02-10Bibliographically approved

Open Access in DiVA

fulltext(1487 kB)67 downloads
File information
File name FULLTEXT01.pdfFile size 1487 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
Department of Statistics
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 67 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 169 hits
ReferencesLink to record
Permanent link

Direct link