The Problem with Ranking Ensembles Based on Training or Validation Performance
2008 (English)In: Proceedings of the International Joint Conference on Neural Networks, IEEE Press , 2008Conference paper (Refereed)
The main purpose of this study was to determine whether it is possible to somehow use results on training or validation data to estimate ensemble performance on novel data. With the specific setup evaluated; i.e. using ensembles built from a pool of independently trained neural networks and targeting diversity only implicitly, the answer is a resounding no. Experimentation, using 13 UCI datasets, shows that there is in general nothing to gain in performance on novel data by choosing an ensemble based on any of the training measures evaluated here. This is despite the fact that the measures evaluated include all the most frequently used; i.e. ensemble training and validation accuracy, base classifier training and validation accuracy, ensemble training and validation AUC and two diversity measures. The main reason is that all ensembles tend to have quite similar performance, unless we deliberately lower the accuracy of the base classifiers. The key consequence is, of course, that a data miner can do no better than picking an ensemble at random. In addition, the results indicate that it is futile to look for an algorithm aimed at optimizing ensemble performance by somehow selecting a subset of available base classifiers.
Place, publisher, year, edition, pages
IEEE Press , 2008.
ensembles, diversity, Computer Science
Computer and Information Science
IdentifiersURN: urn:nbn:se:hb:diva-5946DOI: 10.1109/IJCNN.2008.4634255Local ID: 2320/3973ISBN: 978-1-4244-1821-3OAI: oai:DiVA.org:hb-5946DiVA: diva2:886629
IJCNN 2008, Hong Kong, June 1- 6, 2008
This work was supported by the Information Fusion Research Program (University of Skövde, Sweden) in partnership with the Swedish Knowledge Foundation under grant 2003/0104.2015-12-222015-12-22