A Mixture-of-Experts Approach for Gene Regulatory Network Inference
Independent thesis Advanced level (degree of Master (Two Years))Student thesis
Context. Gene regulatory network (GRN) inference is an important and challenging problem in bioinformatics. A variety of machine learning algorithms have been applied to increase the GRN inference accuracy. Ensemble learning methods are shown to yield a higher inference accuracy than individual algorithms. Objectives. We propose an ensemble GRN inference method, which is based on the principle of Mixture-of-Experts ensemble learning. The proposed method can quantitatively measure the accuracy of individual GRN inference algorithms at the network motifs level. Based on the accuracy of the individual algorithms at predicting different types of network motifs, weights are assigned to the individual algorithms so as to take advantages of their strengths and weaknesses. In this way, we can improve the accuracy of the ensemble prediction. Methods. The research methodology is controlled experiment. The independent variable is method. It has eight groups: five individual algorithms, the generic average ranking method used in the DREAM5 challenge, the proposed ensemble method including four types of network motifs and five types of network motifs. The dependent variable is GRN inference accuracy, measured by the area under the precision-recall curve (AUPR). The experiment has training and testing phases. In the training phase, we analyze the accuracy of five individual algorithms at the network motifs level to decide their weights. In the testing phase, the weights are used to combine predictions from the five individual algorithms to generate ensemble predictions. We compare the accuracy of the eight method groups on Escherichia coli microarray dataset using AUPR. Results. In the training phase, we obtain the AUPR values of the five individual algorithms at predicting each type of the network motifs. In the testing phase, we collect the AUPR values of the eight methods on predicting the GRN of the Escherichia coli microarray dataset. Each method group has a sample size of ten (ten AUPR values). Conclusions. Statistical tests on the experiment results show that the proposed method yields a significantly higher accuracy than the generic average ranking method. In addition, a new type of network motif is found in GRN, the inclusion of which can increase the accuracy of the proposed method significantly.
Genes are DNA molecules that control the biological traits and biochemical processes that comprise life. They interact with each other to realize the precise regulation of life activities. Biologists aim to understand the regulatory network among the genes, with the help of high-throughput techonologies, such as microarrays, RNA-seq, etc. These technologies produce large amount of gene expression data which contain useful information. Therefore, effective data mining is necessary to discover the information to promote biological research. Gene regulatory network (GRN) inference is to infer the gene interactions from gene expression data, such as microarray datasets. The inference results can be used to guide the direction of further experiments to discover or validate gene interactions. A variety of machine learning (data mining) methods have been proposed to solve this problem. In recent years, experiments have shown that ensemble learning methods achieve higher accuracy than the individual learning methods. Because the ensemble learning methods can take advantages of the strength of different individual methods and it is robust to different network structures. In this thesis, we propose an ensemble GRN inference method, which is based on the principle of the Mixture-of-Experts ensemble learning. By quantitatively measure the accuracy of individual methods at the network motifs level, the proposed method is able to take advantage of the complementarity among the individual methods. The proposed method yields a significantly higher accuracy than the generic average ranking method, which is the most accurate method out of 35 GRN inference methods in the DREAM5 challenge.
Place, publisher, year, edition, pages
2014. , 53 p.
GRN inference, Ensemble learning, Mixture-of-Experts, network motif analysis
Computer Science Information Systems
IdentifiersURN: urn:nbn:se:bth-3072Local ID: oai:bth.se:arkivexA20B31AC3BCAFBC1C1257D0400456230OAI: oai:DiVA.org:bth-3072DiVA: diva2:830370
Lavesson, Dr. Niklas