Horseshoe RuleFit: Learning Rule Ensembles via Bayesian Regularization
Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
This work proposes Hs-RuleFit, a learning method for regression and classiﬁcation, which combines rule ensemble learning based on the RuleFit algorithm with Bayesian regularization through the horseshoe prior. To this end theoretical properties and potential problems of this combination are studied. A second step is the implementation, which utilizes recent sampling schemes to make the Hs-RuleFit computationally feasible. Additionally, changes to the RuleFit algorithm are proposed such as Decision Rule post-processing and the usage of Decision rules generated via Random Forest.
Hs-RuleFit addresses the problem of ﬁnding highly accurate and yet interpretable models. The method shows to be capable of ﬁnding compact sets of informative decision rules that give a good insight in the data. Through the careful choice of prior distributions the horse-shoe prior shows to be superior to the Lasso in this context. In an empirical evaluation on 16 real data sets Hs-RuleFit shows excellent performance in regression and outperforms the popular methods Random Forest, BART and RuleFit in terms of prediction error. The interpretability is demonstrated on selected data sets. This makes the Hs-RuleFit a good choice for science domains in which interpretability is desired.
Problems are found in classiﬁcation, regarding the usage of the horseshoe prior and rule ensemble learning in general. A simulation study is performed to isolate the problems and potential solutions are discussed.
Arguments are presented, that the horseshoe prior could be a good choice in other machine learning areas, such as artiﬁcial neural networks and support vector machines.
Place, publisher, year, edition, pages
2016. , 54 p.
Bayesian Statistics, Regularization, Ensemble Learning, Decision Rules, Horseshoe prior, Machine Learning, Knowledge Discovery
Probability Theory and Statistics Computer Science Bioinformatics (Computational Biology) Other Computer and Information Science
IdentifiersURN: urn:nbn:se:liu:diva-130249ISRN: LIU-IDA/STAT-A--16/009--SEOAI: oai:DiVA.org:liu-130249DiVA: diva2:950073
Subject / course
Villani, Mattias, Professor
Sysoev, Oleg, Doctor