Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
On Effectively Creating Ensembles of Classifiers: Studies on Creation Strategies, Diversity and Predicting with Confidence
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences. Högskolan i Borås.ORCID iD: 0000-0003-0274-9026
2015 (English)Doctoral thesis, comprehensive summary (Other academic) [Artistic work]
Abstract [en]

An ensemble is a composite model, combining the predictions from several other models. Ensembles are known to be more accurate than single models. Diversity has been identified as an important factor in explaining the success of ensembles. In the context of classification, diversity has not been well defined, and several heuristic diversity measures have been proposed. The focus of this thesis is on how to create effective ensembles in the context of classification. Even though several effective ensemble algorithms have been proposed, there are still several open questions regarding the role diversity plays when creating an effective ensemble. Open questions relating to creating effective ensembles that are addressed include: what to optimize when trying to find an ensemble using a subset of models used by the original ensemble that is more effective than the original ensemble; how effective is it to search for such a sub-ensemble; how should the neural networks used in an ensemble be trained for the ensemble to be effective? The contributions of the thesis include several studies evaluating different ways to optimize which sub-ensemble would be most effective, including a novel approach using combinations of performance and diversity measures. The contributions of the initial studies presented in the thesis eventually resulted in an investigation of the underlying assumption motivating the search for more effective sub-ensembles. The evaluation concluded that even if several more effective sub-ensembles exist, it may not be possible to identify which sub-ensembles would be the most effective using any of the evaluated optimization measures. An investigation of the most effective ways to train neural networks to be used in ensembles was also performed. The conclusions are that effective ensembles can be obtained by training neural networks in a number of different ways but that high average individual accuracy or much diversity both would generate effective ensembles. Several findings regarding diversity and effective ensembles presented in the literature in recent years are also discussed and related to the results of the included studies. When creating confidence based predictors using conformal prediction, there are several open questions regarding how data should be utilized effectively when using ensembles. Open questions related to predicting with confidence that are addressed include: how can data be utilized effectively to achieve more efficient confidence based predictions using ensembles; how do problems with class imbalance affect the confidence based predictions when using conformal prediction? Contributions include two studies where it is shown in the first that the use of out-of-bag estimates when using bagging ensembles results in more effective conformal predictors and it is shown in the second that a conformal predictor conditioned on the class labels to avoid a strong bias towards the majority class is more effective on problems with class imbalance. The research method used is mainly inspired by the design science paradigm, which is manifested by the development and evaluation of artifacts. 

Abstract [sv]

En ensemble är en sammansatt modell som kombinerar prediktionerna från flera olika modeller. Det är välkänt att ensembler är mer träffsäkra än enskilda modeller. Diversitet har identifierats som en viktig faktor för att förklara varför ensembler är så framgångsrika. Diversitet hade fram tills nyligen inte definierats entydigt för klassificering vilket resulterade i att många heuristiska diverstitetsmått har föreslagits. Den här avhandlingen fokuserar på hur klassificeringsensembler kan skapas på ett ändamålsenligt (eng. effective) sätt. Den vetenskapliga metoden är huvudsakligen inspirerad av design science-paradigmet vilket lämpar sig väl för utveckling och evaluering av IT-artefakter. Det finns sedan tidigare många framgångsrika ensembleralgoritmer men trots det så finns det fortfarande vissa frågetecken kring vilken roll diversitet spelar vid skapande av välpresterande (eng. effective) ensemblemodeller. Några av de frågor som berör diversitet som behandlas i avhandlingen inkluderar: Vad skall optimeras när man söker efter en delmängd av de tillgängliga modellerna för att försöka skapa en ensemble som är bättre än ensemblen bestående av samtliga modeller; Hur väl fungerar strategin att söka efter sådana delensembler; Hur skall neurala nätverk tränas för att fungera så bra som möjligt i en ensemble? Bidraget i avhandlingen inkluderar flera studier som utvärderar flera olika sätt att finna delensembler som är bättre än att använda hela ensemblen, inklusive ett nytt tillvägagångssätt som utnyttjar en kombination av både diversitets- och prestandamått. Resultaten i de första studierna ledde fram till att det underliggande antagandet som motiverar att söka efter delensembler undersöktes. Slutsatsen blev, trots att det fanns flera delensembler som var bättre än hela ensemblen, att det inte fanns något sätt att identifiera med tillgänglig data vilka de bättre delensemblerna var. Vidare undersöktes hur neurala nätverk bör tränas för att tillsammans samverka så väl som möjligt när de används i en ensemble. Slutsatserna från den undersökningen är att det är möjligt att skapa välpresterande ensembler både genom att ha många modeller som är antingen bra i genomsnitt eller olika varandra (dvs diversa). Insikter som har presenterats i litteraturen under de senaste åren diskuteras och relateras till resultaten i de inkluderade studierna. När man skapar konfidensbaserade modeller med hjälp av ett ramverk som kallas för conformal prediction så finns det flera frågor kring hur data bör utnyttjas på bästa sätt när man använder ensembler som behöver belysas. De frågor som relaterar till konfidensbaserad predicering inkluderar: Hur kan data utnyttjas på bästa sätt för att åstadkomma mer effektiva konfidensbaserade prediktioner med ensembler; Hur påverkar obalanserad datade konfidensbaserade prediktionerna när man använder conformal perdiction? Bidragen inkluderar två studier där resultaten i den första visar att det mest effektiva sättet att använda data när man har en baggingensemble är att använda sk out-of-bag estimeringar. Resultaten i den andra studien visar att obalanserad data behöver hanteras med hjälp av en klassvillkorad konfidensbaserad modell för att undvika en stark tendens att favorisera majoritetsklassen.

Place, publisher, year, edition, pages
Stockholm: Department of Computer and Systems Sciences, Stockholm University , 2015. , 82 p.
Series
Report Series / Department of Computer & Systems Sciences, ISSN 1101-8526 ; 15-009
Keyword [en]
Machine Learning, Predictive Modeling, Ensembles, Conformal Prediction
National Category
Computer Science
Research subject
Computer and Systems Sciences
Identifiers
URN: urn:nbn:se:su:diva-116683ISBN: 978-91-7649-179-9 (print)OAI: oai:DiVA.org:su-116683DiVA: diva2:807555
Public defence
2015-06-11, L30, NOD-huset, Borgarfjordsgatan 12, Kista, 10:00 (English)
Opponent
Supervisors
Projects
Dataanalys för detektion av läkemedelseffekter (DADEL)
Funder
Swedish Foundation for Strategic Research
Note

At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 8: In press.

Available from: 2015-05-20 Created: 2015-04-23 Last updated: 2015-05-21Bibliographically approved
List of papers
1. Empirically investigating the importance of diversity
Open this publication in new window or tab >>Empirically investigating the importance of diversity
2007 (English)In: 10th International Conference on Information Fusion, 2007, 2007, 1-8 p.Conference paper, Published paper (Refereed)
Abstract [en]

Most predictive modeling in information fusion is performed using ensembles. When designing ensembles, the prevailing opinion is that base classifier diversity is vital for how well the ensemble will generalize to new observations. Unfortunately, the key term diversity is not uniquely defined, leading to several diversity measures and many methods for diversity creation. In addition, no specific diversity measure has shown to have high correlation with generalization accuracy. The purpose of this paper is to empirically evaluate ten different diversity measures, using neural network ensembles and 8 publicly available data sets. The main result is that all diversity measures evaluated show low or very low correlation with test set accuracy. In addition, it is obvious that the most diverse ensembles often obtain very poor accuracy. Based on these results, it appears to be quite a challenge to explicitly utilize diversity when optimizing ensembles.

National Category
Computer Science
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-116679 (URN)10.1109/ICIF.2007.4408197 (DOI)978-0-662-45804-3 (ISBN)978-0-662-45804-3 (ISBN)
Conference
10th International Conference on Information Fusion, Quebec, Canada, 9-12 July 2007
Available from: 2015-04-23 Created: 2015-04-23 Last updated: 2015-05-06Bibliographically approved
2. On the Use of Accuracy and Diversity Measures for Evaluating and Selecting Ensembles of Classifiers
Open this publication in new window or tab >>On the Use of Accuracy and Diversity Measures for Evaluating and Selecting Ensembles of Classifiers
2008 (English)In: 2008 Seventh International Conference on Machine Learning and Applications, 2008, 127-132 p.Conference paper, Published paper (Refereed)
Abstract [en]

The test set accuracy for ensembles of classifiers selected based on single measures of accuracy and diversity as well as combinations of such measures is investigated. It is found that by combining measures, a higher test set accuracy may be obtained than by using any single accuracy or diversity measure. It is further investigated whether a multi-criteria search for an ensemble that maximizes both accuracy and diversity leads to more accurate ensembles than by optimizing a single criterion. The results indicate that it might be more beneficial to search for ensembles that are both accurate and diverse. Furthermore, the results show that diversity measures could compete with accuracy measures as selection criterion.

National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-33902 (URN)10.1109/ICMLA.2008.102 (DOI)978-0-7695-3495-4 (ISBN)
Conference
Seventh International Conference on Machine Learning and Applications (ICMLA), San Diego, CA, 11-13 Dec. 2008
Available from: 2009-12-30 Created: 2009-12-30 Last updated: 2015-05-06Bibliographically approved
3. Ensemble member selection using multi-objective optimization
Open this publication in new window or tab >>Ensemble member selection using multi-objective optimization
2009 (English)In: IEEE Symposium on Computational Intelligence and Data Mining, 2009, 245-251 p.Conference paper, Published paper (Refereed)
Abstract [en]

Both theory and a wealth of empirical studies have established that ensembles are more accurate than single predictive models. Unfortunately, the problem of how to maximize ensemble accuracy is, especially for classification, far from solved. In essence, the key problem is to find a suitable criterion, typically based on training or selection set performance, highly correlated with ensemble accuracy on novel data. Several studies have, however, shown that it is difficult to come up with a single measure, such as ensemble or base classifier selection set accuracy, or some measure based on diversity, that is a good general predictor for ensemble test accuracy. This paper presents a novel technique that for each learning task searches for the most effective combination of given atomic measures, by means of a genetic algorithm. Ensembles built from either neural networks or random forests were empirically evaluated on 30 UCI datasets. The experimental results show that when using the generated combined optimization criteria to rank candidate ensembles, a higher test set accuracy for the top ranked ensemble was achieved, compared to using ensemble accuracy on selection data alone. Furthermore, when creating ensembles from a pool of neural networks, the use of the generated combined criteria was shown to generally outperform the use of estimated ensemble accuracy as the single optimization criterion.

National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-33426 (URN)10.1109/CIDM.2009.4938656 (DOI)978-1-4244-2765-9 (ISBN)
Conference
IEEE Symposium on Computational Intelligence and Data Mining (CIDM), Nashville, TN, March 30 2009-April 2 2009
Available from: 2009-12-23 Created: 2009-12-23 Last updated: 2015-05-06Bibliographically approved
4. Comparing methods for generating diverse ensembles of artificial neural networks
Open this publication in new window or tab >>Comparing methods for generating diverse ensembles of artificial neural networks
2010 (English)In: International Joint Conference on Neural Networks (IJCNN) 2010, 2010, 1-6 p.Conference paper, Published paper (Refereed)
Abstract [en]

It is well-known that ensemble performance relies heavily on sufficient diversity among the base classifiers. With this in mind, the strategy used to balance diversity and base classifier accuracy must be considered a key component of any ensemble algorithm. This study evaluates the predictive performance of neural network ensembles, specifically comparing straightforward techniques to more sophisticated. In particular, the sophisticated methods GASEN and NegBagg are compared to more straightforward methods, where each ensemble member is trained independently of the others. In the experimentation, using 31 publicly available data sets, the straightforward methods clearly outperformed the sophisticated methods, thus questioning the use of the more complex algorithms.

National Category
Computer Science
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-116680 (URN)10.1109/IJCNN.2010.5596763 (DOI)978-1-4244-6916-1 (ISBN)
Conference
International Joint Conference on Neural Networks (IJCNN), Barcelona, 18-23 July 2010
Available from: 2015-04-23 Created: 2015-04-23 Last updated: 2015-05-06Bibliographically approved
5. Overproduce-and-Select: The Grim Reality
Open this publication in new window or tab >>Overproduce-and-Select: The Grim Reality
2013 (English)In: 2013 IEEE Symposium on Computational Intelligence and Ensemble Learning (CIEL), IEEE conference proceedings, 2013, 52-59 p.Conference paper, Published paper (Refereed)
Abstract [en]

Overproduce-and-select (OPAS) is a frequently used paradigm for building ensembles. In static OPAS, a large number of base classifiers are trained, before a subset of the available models is selected to be combined into the final ensemble. In general, the selected classifiers are supposed to be accurate and diverse for the OPAS strategy to result in highly accurate ensembles, but exactly how this is enforced in the selection process is not obvious. Most often, either individual models or ensembles are evaluated, using some performance metric, on available and labeled data. Naturally, the underlying assumption is that an observed advantage for the models (or the resulting ensemble) will carry over to test data. In the experimental study, a typical static OPAS scenario, using a pool of artificial neural networks and a number of very natural and frequently used performance measures, is evaluated on 22 publicly available data sets. The discouraging result is that although a fairly large proportion of the ensembles obtained higher test set accuracies, compared to using the entire pool as the ensemble, none of the selection criteria could be used to identify these highly accurate ensembles. Despite only investigating a specific scenario, we argue that the settings used are typical for static OPAS, thus making the results general enough to question the entire paradigm.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2013
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-97229 (URN)10.1109/CIEL.2013.6613140 (DOI)978-1-4673-5853-8 (ISBN)
Conference
2013 IEEE Symposium on Computational Intelligence and Ensemble Learning (CIEL), 16-19 April 2013, Singapore
Available from: 2013-12-05 Created: 2013-12-05 Last updated: 2015-05-06Bibliographically approved
6. Producing implicit diversity in ANN ensembles
Open this publication in new window or tab >>Producing implicit diversity in ANN ensembles
2012 (English)In: The 2012 International Joint Conference on Neural Networks (IJCNN), 2012, 1-8 p.Conference paper, Published paper (Refereed)
Abstract [en]

Combining several ANNs into ensembles normally results in a very accurate and robust predictive models. Many ANN ensemble techniques are, however, quite complicated and often explicitly optimize some diversity metric. Unfortunately, the lack of solid validation of the explicit algorithms, at least for classification, makes the use of diversity measures as part of an optimization function questionable. The merits of implicit methods, most notably bagging, are on the other hand experimentally established and well-known. This paper evaluates a number of straightforward techniques for introducing implicit diversity in ANN ensembles, including a novel technique producing diversity by using ANNs with different and slightly randomized link structures. The experimental results, comparing altogether 54 setups and two different ensemble sizes on 30 UCI data sets, show that all methods succeeded in producing implicit diversity, but that the effect on ensemble accuracy varied. Still, most setups evaluated did result in more accurate ensembles, compared to the baseline setup, especially for the larger ensemble size. As a matter of fact, several setups even obtained significantly higher ensemble accuracy than bagging. The analysis also identified that diversity was, relatively speaking, more important for the larger ensembles. Looking specifically at the methods used to increase the implicit diversity, setups using the technique that utilizes the randomized link structures generally produced the most accurate ensembles.

National Category
Computer Science
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-116681 (URN)10.1109/IJCNN.2012.6252713 (DOI)978-1-4673-1488-6 (ISBN)978-1-4673-1489-3 (ISBN)
Conference
The 2012 International Joint Conference on Neural Networks (IJCNN), Brisbane, QLD, 10-15 June 2012
Available from: 2015-04-23 Created: 2015-04-23 Last updated: 2015-05-06Bibliographically approved
7. Effective Utilization of Data in Inductive Conformal Prediction using Ensembles of Neural Networks
Open this publication in new window or tab >>Effective Utilization of Data in Inductive Conformal Prediction using Ensembles of Neural Networks
2013 (English)In: The 2013 International Joint Conference on Neural Networks (IJCNN): Proceedings, IEEE conference proceedings, 2013, 1-8 p.Conference paper, Published paper (Refereed)
Abstract [en]

Conformal prediction is a new framework producing region predictions with a guaranteed error rate. Inductive conformal prediction (ICP) was designed to significantly reduce the computational cost associated with the original transductive online approach. The drawback of inductive conformal prediction is that it is not possible to use all data for training, since it sets aside some data as a separate calibration set. Recently, cross-conformal prediction (CCP) and bootstrap conformal prediction (BCP) were proposed to overcome that drawback of inductive conformal prediction. Unfortunately, CCP and BCP both need to build several models for the calibration, making them less attractive. In this study, focusing on bagged neural network ensembles as conformal predictors, ICP, CCP and BCP are compared to the very straightforward and cost-effective method of using the out-of-bag estimates for the necessary calibration. Experiments on 34 publicly available data sets conclusively show that the use of out-of-bag estimates produced the most efficient conformal predictors, making it the obvious preferred choice for ensembles in the conformal prediction framework.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2013
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-100717 (URN)10.1109/IJCNN.2013.6706817 (DOI)978-1-4673-6128-6 (ISBN)
Conference
The 2013 International Joint Conference on Neural Networks (IJCNN), Dallas, Texas, USA, 4-9 August 2013
Available from: 2014-02-12 Created: 2014-02-12 Last updated: 2015-05-06Bibliographically approved
8. Bias Reduction through Conditional Conformal Prediction
Open this publication in new window or tab >>Bias Reduction through Conditional Conformal Prediction
2015 (English)In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 9, no 6, 1355-1375 p.Article in journal (Refereed) Published
Abstract [en]

Conformal prediction (CP) is a relatively new framework in which predictive models output sets of predictions with a bound on the error rate, i.e., the probability of making an erroneous prediction is guaranteed to be equal to or less than a predefined significance level. Label-conditional conformal prediction (LCCP) is a specialization of the framework which gives a bound on the error rate for each individual class. For datasets with class imbalance, many learning algorithms have a tendency to predict the majority class more often than the expected relative frequency, i.e., they are biased in favor of the majority class. In this study, the class bias of standard and label-conditional conformal predictors is investigated. An empirical investigation on 32 publicly available datasets with varying degrees of class imbalance is presented. The experimental results show that CP is highly biased towards the majority class on imbalanced datasets, i.e., it can be expected to make a majority of its errors on the minority class. LCCP, on the other hand, is not biased towards the majority class. Instead, the errors are distributed between the classes almost in accordance with the prior class distribution.

Keyword
Conformal prediction, imbalanced learning, class bias
National Category
Computer Science
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-116682 (URN)10.3233/IDA-150786 (DOI)000366058000010 ()
Available from: 2015-04-23 Created: 2015-04-23 Last updated: 2017-12-04Bibliographically approved

Open Access in DiVA

fulltext(508 kB)