Change search
ReferencesLink to record
Permanent link

Direct link
Machine learning techniques for binary classification of microarray data with correlation-based gene selection
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Social Sciences, Department of Statistics.
2016 (English)Independent thesis Advanced level (degree of Master (Two Years)), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Microarray analysis has made it possible to predict clinical outcomes or diagnosing patients with the help of biological data such as biomarkers or gene expressions. The data from microarrays are however characterized by high dimensionality and sparsity so that traditional statistical methods are difficult to use and machine learning algorithms are therefore applied for classification and prediction. In this thesis, five different machine learning algorithms were applied on four different microarray datasets from cancer studies and evaluated in terms of cross-validation performance and classification accuracy. A correlation-based gene selection method was also applied in order to reduce the amount of genes with the aim of improving accuracy of the algorithms. The findings of the thesis imply that the algorithm s elastic net and nearest shrunken centroid perform best on datasets with no gene selection, while support vector machine and random forest perform well on the reduced datasets with gene selection. However, no machine learning algorithm can be said to consistently outperform any of the other and the nature of the dataset seem to be a more important influence on the performance of the algorithm. The correlation-based gene selection method did however improve prediction accuracy of all the models by removing irrelevant genes.

Place, publisher, year, edition, pages
2016. , 29 p.
National Category
Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:uu:diva-302402OAI: oai:DiVA.org:uu-302402DiVA: diva2:957430
Subject / course
Statistics
Educational program
Master Programme in Statistics
Supervisors
Examiners
Available from: 2016-09-02 Created: 2016-09-02 Last updated: 2016-09-02Bibliographically approved

Open Access in DiVA

fulltext(368 kB)21 downloads
File information
File name FULLTEXT01.pdfFile size 368 kBChecksum SHA-512
c66876a6b35dc0e7e6d9d23e1921a659da8728099e8feeec69b33076ac7a13c22541c85ee48fb151e43941f14a961c8e460b5f8a11bae675706a161c3ba6c3e8
Type fulltextMimetype application/pdf

By organisation
Department of Statistics
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 21 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 338 hits
ReferencesLink to record
Permanent link

Direct link