A Boosted-Window Ensemble
Independent thesis Advanced level (degree of Master (One Year))Student thesis
Context. The problem of obtaining predictions from stream data involves training on the labeled instances and suggesting the class values for the unseen stream instances. The nature of the data-stream environments makes this task complicated. The large number of instances, the possibility of changes in the data distribution, presence of noise and drifting concepts are just some of the factors that add complexity to the problem. Various supervised-learning algorithms have been designed by putting together efficient data-sampling, ensemble-learning, and incremental-learning methods. The performance of the algorithm is dependent on the chosen methods. This leaves an opportunity to design new supervised-learning algorithms by using different combinations of constructing methods. Objectives. This thesis work proposes a fast and accurate supervised-learning algorithm for performing predictions on the data-streams. This algorithm is called as Boosted-Window Ensemble (BWE), which is invented using the mixture-of-experts technique. BWE uses Sliding Window, Online Boosting and incremental-learning for data-sampling, ensemble-learning, and maintaining a consistent state with the current stream data, respectively. In this regard, a sliding window method is introduced. This method uses partial-updates for sliding the window on the data-stream and is called Partially-Updating Sliding Window (PUSW). The investigation is carried out to compare two variants of sliding window and three different ensemble-learning methods for choosing the superior methods. Methods. The thesis uses experimentation approach for evaluating the Boosted-Window Ensemble (BWE). CPU-time and the Prediction accuracy are used as performance indicators, where CPU-time is the execution time in seconds. The benchmark algorithms include: Accuracy-Updated Ensemble1 (AUE1), Accuracy-Updated Ensemble2 (AUE2), and Accuracy-Weighted Ensemble (AWE). The experiments use nine synthetic and five real-world datasets for generating performance estimates. The Asymptotic Friedman test and the Wilcoxon Signed-Rank test are used for hypothesis testing. The Wilcoxon-Nemenyi-McDonald-Thompson test is used for performing post-hoc analysis. Results. The hypothesis testing suggests that: 1) both for the synthetic and real-wrold datasets, the Boosted Window Ensemble (BWE) has significantly lower CPU-time values than two benchmark algorithms (Accuracy-updated Ensemble1 (AUE1) and Accuracy-weighted Ensemble (AWE). 2) BWE returns similar prediction accuracy as AUE1 and AWE for synthetic datasets. 3) BWE returns similar prediction accuracy as the three benchmark algorithms for the real-world datasets. Conclusions. Experimental results demonstrate that the proposed algorithm can be as accurate as the state-of-the-art benchmark algorithms, while obtaining predictions from the stream data. The results further show that the use of Partially-Updating Sliding Window has resulted in lower CPU-time for BWE as compared with the chunk-based sliding window method used in AUE1, AUE2, and AWE.
Place, publisher, year, edition, pages
2014. , 39 p.
Stream Mining, Supervised-learning by classification, Online learning algorithms, Ensemble Methods, Boosting
IdentifiersURN: urn:nbn:se:bth-5658Local ID: oai:bth.se:arkivexEB18525181F2B041C1257D82004DBDD4OAI: oai:DiVA.org:bth-5658DiVA: diva2:833048
Lavesson, Dr. Niklas