Genome-wide discovery of miRNAs using ensembles of machine learning algorithms and logistic regression
2015 (English)In: International Journal of Data Mining and Bioinformatics, ISSN 1748-5681, Vol. 13, no 4, 338-359 p.Article in journal (Refereed) Published
In silico prediction of novel miRNAs from genomic sequences remains a challenging problem. This study presents a genome-wide miRNA discovery software package called GenoScan and evaluates two hairpin classification methods. These methods, one ensemble-based and one using logistic regression were benchmarked along with 15 published methods. In addition, the sequence-folding step is addressed by investigating the impact of secondary structure prediction methods and the choice of input sequence length on prediction performance. Both the accuracy of secondary structure predictions and the miRNA prediction are evaluated. In the benchmark of hairpin classification methods, the regression model achieved highest classification accuracy. Of the structure prediction methods evaluated, ContextFold achieved the highest agreement between predicted and experimentally determined structures. However, both the choice of secondary structure prediction method and input sequence length had limited impact on hairpin classification performance.
Place, publisher, year, edition, pages
InderScience Publishers, 2015. Vol. 13, no 4, 338-359 p.
Bioinformatics and Systems Biology
Research subject Natural sciences
IdentifiersURN: urn:nbn:se:his:diva-11759DOI: 10.1504/IJDMB.2015.072755ISI: 000366135400002PubMedID: 26547983ScopusID: 2-s2.0-84946741012OAI: oai:DiVA.org:his-11759DiVA: diva2:882854