Open this publication in new window or tab >>Show others...
2017 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 16, p. 2464-2470Article in journal (Refereed) Published
Abstract [en]
Motivation: Knowledge of the correct protein subcellular localization is necessary for understanding the function of a protein. Unfortunately large-scale experimental studies are limited in their accuracy. Therefore, the development of prediction methods has been limited by the amount of accurate experimental data. However, recently large-scale experimental studies have provided new data that can be used to evaluate the accuracy of subcellular predictions in human cells. Using this data we examined the performance of state of the art methods and developed SubCons, an ensemble method that combines four predictors using a Random Forest classifier. Results: SubCons outperforms earlier methods in a dataset of proteins where two independent methods confirm the subcellular localization. Given nine subcellular localizations, SubCons achieves an F1-Score of 0.79 compared to 0.70 of the second bestmethod. Furthermore, at a FPR of 1% the true positive rate (TPR) is over 58% for SubCons compared to less than 50% for the best individual predictor.
National Category
Biological Sciences Environmental Biotechnology
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-147084 (URN)10.1093/bioinformatics/btx219 (DOI)000407139800005 ()28407043 (PubMedID)
2017-10-162017-10-162022-02-28Bibliographically approved