Change search
ReferencesLink to record
Permanent link

Direct link
Classification of microarrays: synergistic effects between normalization, gene selection and machine learning
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences.
Show others and affiliations
2011 (English)In: BMC Bioinformatics, ISSN 1471-2105, Vol. 12, 390- p.Article in journal (Refereed) Published
Abstract [en]

Background: Machine learning is a powerful approach for describing and predicting classes in microarray data. Although several comparative studies have investigated the relative performance of various machine learning methods, these often do not account for the fact that performance (e. g. error rate) is a result of a series of analysis steps of which the most important are data normalization, gene selection and machine learning. Results: In this study, we used seven previously published cancer-related microarray data sets to compare the effects on classification performance of five normalization methods, three gene selection methods with 21 different numbers of selected genes and eight machine learning methods. Performance in term of error rate was rigorously estimated by repeatedly employing a double cross validation approach. Since performance varies greatly between data sets, we devised an analysis method that first compares methods within individual data sets and then visualizes the comparisons across data sets. We discovered both well performing individual methods and synergies between different methods. Conclusion: Support Vector Machines with a radial basis kernel, linear kernel or polynomial kernel of degree 2 all performed consistently well across data sets. We show that there is a synergistic relationship between these methods and gene selection based on the T-test and the selection of a relatively high number of genes. Also, we find that these methods benefit significantly from using normalized data, although it is hard to draw general conclusions about the relative performance of different normalization procedures.

Place, publisher, year, edition, pages
2011. Vol. 12, 390- p.
National Category
Natural Sciences
URN: urn:nbn:se:uu:diva-166095DOI: 10.1186/1471-2105-12-390ISI: 000297641600001OAI: diva2:475719
Available from: 2012-01-11 Created: 2012-01-10 Last updated: 2012-01-11Bibliographically approved

Open Access in DiVA

fulltext(1592 kB)68 downloads
File information
File name FULLTEXT01.pdfFile size 1592 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Freyhult, Eva
By organisation
Department of Medical Sciences
In the same journal
BMC Bioinformatics
Natural Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 68 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 193 hits
ReferencesLink to record
Permanent link

Direct link