Change search
Refine search result
12 1 - 50 of 74
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Ahlberg, Ernst
    et al.
    Predictive Compound ADME & Safety, Drug Safety & Metabolism, AstraZeneca IMED Biotech Unit, Mölndal, Sweden.
    Winiwarter, Susanne
    Predictive Compound ADME & Safety, Drug Safety & Metabolism, AstraZeneca IMED Biotech Unit, Mölndal, Sweden.
    Boström, Henrik
    Department of Computer and Systems Sciences, Stockholm University, Sweden.
    Linusson, Henrik
    Department of Information Technology, University of Borås, Sweden.
    Löfström, Tuve
    Högskolan i Jönköping, JTH. Forskningsmiljö Datavetenskap och informatik.
    Norinder, Ulf
    Swetox, Karolinska Institutet, Unit of Toxicology Sciences, Sweden.
    Johansson, Ulf
    Högskolan i Jönköping, JTH, Datateknik och informatik.
    Engkvist, Ola
    External Sciences, Discovery Sciences, AstraZeneca IMED Biotech Unit, Mölndal, Sweden.
    Hammar, Oscar
    Quantitative Biology, Discovery Sciences, AstraZeneca IMED Biotech Unit, Mölndal, Sweden.
    Bendtsen, Claus
    Quantitative Biology, Discovery Sciences, AstraZeneca IMED Biotech Unit, Cambridge, UK.
    Carlsson, Lars
    Quantitative Biology, Discovery Sciences, AstraZeneca IMED Biotech Unit, Mölndal, Sweden.
    Using conformal prediction to prioritize compound synthesis in drug discovery2017In: Proceedings of Machine Learning Research: Volume 60: Conformal and Probabilistic Prediction and Applications, 13-16 June 2017, Stockholm, Sweden / [ed] Alex Gammerman, Vladimir Vovk, Zhiyuan Luo, and Harris Papadopoulos, 2017, p. 174-184Conference paper (Refereed)
    Abstract [en]

    The choice of how much money and resources to spend to understand certain problems is of high interest in many areas. This work illustrates how computational models can be more tightly coupled with experiments to generate decision data at lower cost without reducing the quality of the decision. Several different strategies are explored to illustrate the trade off between lowering costs and quality in decisions.

    AUC is used as a performance metric and the number of objects that can be learnt from is constrained. Some of the strategies described reach AUC values over 0.9 and outperforms strategies that are more random. The strategies that use conformal predictor p-values show varying results, although some are top performing.

    The application studied is taken from the drug discovery process. In the early stages of this process compounds, that potentially could become marketed drugs, are being routinely tested in experimental assays to understand the distribution and interactions in humans.

  • 2.
    Ahlberg, Ernst
    et al.
    Predictive Compound ADME & Safety, Drug Safety & Metabolism, AstraZeneca IMED Biotech Unit, Mölndal, Sweden.
    Winiwarter, Susanne
    Predictive Compound ADME & Safety, Drug Safety & Metabolism, AstraZeneca IMED Biotech Unit, Mölndal, Sweden.
    Boström, Henrik
    Department of Computer and Systems Sciences, Stockholm University, Sweden.
    Linusson, Henrik
    Department of Information Technology, University of Borås, Sweden.
    Löfström, Tuve
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL). Department of Information Technology, University of Borås, Sweden.
    Norinder, Ulf
    Swetox, Karolinska Institutet, Unit of Toxicology Sciences, Sweden.
    Johansson, Ulf
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL). Department of Information Technology, University of Borås, Sweden.
    Engkvist, Ola
    External Sciences, Discovery Sciences, AstraZeneca IMED Biotech Unit, Mölndal, Sweden.
    Hammar, Oscar
    Quantitative Biology, Discovery Sciences, AstraZeneca IMED Biotech Unit, Mölndal, Sweden.
    Bendtsen, Claus
    Quantitative Biology, Discovery Sciences, AstraZeneca IMED Biotech Unit, Cambridge, UK.
    Carlsson, Lars
    Quantitative Biology, Discovery Sciences, AstraZeneca IMED Biotech Unit, Mölndal, Sweden.
    Using conformal prediction to prioritize compound synthesis in drug discovery2017In: Proceedings of Machine Learning Research: Volume 60: Conformal and Probabilistic Prediction and Applications, 13-16 June 2017, Stockholm, Sweden / [ed] Alex Gammerman, Vladimir Vovk, Zhiyuan Luo, and Harris Papadopoulos, Machine Learning Research , 2017, p. 174-184Conference paper (Refereed)
    Abstract [en]

    The choice of how much money and resources to spend to understand certain problems is of high interest in many areas. This work illustrates how computational models can be more tightly coupled with experiments to generate decision data at lower cost without reducing the quality of the decision. Several different strategies are explored to illustrate the trade off between lowering costs and quality in decisions.

    AUC is used as a performance metric and the number of objects that can be learnt from is constrained. Some of the strategies described reach AUC values over 0.9 and outperforms strategies that are more random. The strategies that use conformal predictor p-values show varying results, although some are top performing.

    The application studied is taken from the drug discovery process. In the early stages of this process compounds, that potentially could become marketed drugs, are being routinely tested in experimental assays to understand the distribution and interactions in humans.

  • 3.
    Boström, Henrik
    et al.
    Department of Computer and Systems Sciences, Stockholm University, Kista, Sweden.
    Linusson, Henrik
    Department of Information Technology, University of Borås, Borås, Sweden.
    Löfström, Tuve
    Department of Information Technology, University of Borås, Borås, Sweden.
    Johansson, Ulf
    Högskolan i Jönköping, JTH, Datateknik och informatik.
    Evaluation of a variance-based nonconformity measure for regression forests2016In: 5th International Symposium on Conformal and Probabilistic Prediction with Applications, COPA 2016, Springer, 2016, Vol. 9653, p. 75-89Conference paper (Refereed)
    Abstract [en]

    In a previous large-scale empirical evaluation of conformal regression approaches, random forests using out-of-bag instances for calibration together with a k-nearest neighbor-based nonconformity measure, was shown to obtain state-of-the-art performance with respect to efficiency, i.e., average size of prediction regions. However, the use of the nearest-neighbor procedure not only requires that all training data have to be retained in conjunction with the underlying model, but also that a significant computational overhead is incurred, during both training and testing. In this study, a more straightforward nonconformity measure is investigated, where the difficulty estimate employed for normalization is based on the variance of the predictions made by the trees in a forest. A large-scale empirical evaluation is presented, showing that both the nearest-neighbor-based and the variance-based measures significantly outperform a standard (non-normalized) nonconformity measure, while no significant difference in efficiency between the two normalized approaches is observed. Moreover, the evaluation shows that state-of-theart performance is achieved by the variance-based measure at a computational cost that is several orders of magnitude lower than when employing the nearest-neighbor-based nonconformity measure. 

  • 4.
    Boström, Henrik
    et al.
    Department of Computer and Systems Sciences, Stockholm University, Kista, Sweden.
    Linusson, Henrik
    Department of Information Technology, University of Borås, Borås, Sweden.
    Löfström, Tuve
    Department of Information Technology, University of Borås, Borås, Sweden.
    Johansson, Ulf
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics. Jönköping University, School of Engineering, JTH. Research area Computer Science and Informatics.
    Evaluation of a variance-based nonconformity measure for regression forests2016In: Conformal and Probabilistic Prediction with Applications, Springer, 2016, p. 75-89Conference paper (Refereed)
    Abstract [en]

    In a previous large-scale empirical evaluation of conformal regression approaches, random forests using out-of-bag instances for calibration together with a k-nearest neighbor-based nonconformity measure, was shown to obtain state-of-the-art performance with respect to efficiency, i.e., average size of prediction regions. However, the use of the nearest-neighbor procedure not only requires that all training data have to be retained in conjunction with the underlying model, but also that a significant computational overhead is incurred, during both training and testing. In this study, a more straightforward nonconformity measure is investigated, where the difficulty estimate employed for normalization is based on the variance of the predictions made by the trees in a forest. A large-scale empirical evaluation is presented, showing that both the nearest-neighbor-based and the variance-based measures significantly outperform a standard (non-normalized) nonconformity measure, while no significant difference in efficiency between the two normalized approaches is observed. Moreover, the evaluation shows that state-of-theart performance is achieved by the variance-based measure at a computational cost that is several orders of magnitude lower than when employing the nearest-neighbor-based nonconformity measure. 

  • 5.
    Boström, Henrik
    et al.
    Department of Computer and Systems Sciences, Stockholm University, Stockholm, Sweden.
    Linusson, Henrik
    Department of Information Technology, University of Borås, Borås, Sweden.
    Löfström, Tuwe
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL). Department of Information Technology, University of Borås, Borås, Sweden.
    Johansson, Ulf
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL).
    Accelerating difficulty estimation for conformal regression forests2017In: Annals of Mathematics and Artificial Intelligence, ISSN 1012-2443, E-ISSN 1573-7470, Vol. 81, no 1-2, p. 125-144Article in journal (Refereed)
    Abstract [en]

    The conformal prediction framework allows for specifying the probability of making incorrect predictions by a user-provided confidence level. In addition to a learning algorithm, the framework requires a real-valued function, called nonconformity measure, to be specified. The nonconformity measure does not affect the error rate, but the resulting efficiency, i.e., the size of output prediction regions, may vary substantially. A recent large-scale empirical evaluation of conformal regression approaches showed that using random forests as the learning algorithm together with a nonconformity measure based on out-of-bag errors normalized using a nearest-neighbor-based difficulty estimate, resulted in state-of-the-art performance with respect to efficiency. However, the nearest-neighbor procedure incurs a significant computational cost. In this study, a more straightforward nonconformity measure is investigated, where the difficulty estimate employed for normalization is based on the variance of the predictions made by the trees in a forest. A large-scale empirical evaluation is presented, showing that both the nearest-neighbor-based and the variance-based measures significantly outperform a standard (non-normalized) nonconformity measure, while no significant difference in efficiency between the two normalized approaches is observed. The evaluation moreover shows that the computational cost of the variance-based measure is several orders of magnitude lower than when employing the nearest-neighbor-based nonconformity measure. The use of out-of-bag instances for calibration does, however, result in nonconformity scores that are distributed differently from those obtained from test instances, questioning the validity of the approach. An adjustment of the variance-based measure is presented, which is shown to be valid and also to have a significant positive effect on the efficiency. For conformal regression forests, the variance-based nonconformity measure is hence a computationally efficient and theoretically well-founded alternative to the nearest-neighbor procedure. 

  • 6.
    Johansson, Ulf
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Boström, Henrik
    Högskolan i Borås, Institutionen Handels- och IT-högskolan, Sweden.
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Conformal Prediction Using Decision Trees2013Conference paper (Refereed)
    Abstract [en]

    Conformal prediction is a relatively new framework in which the predictive models output sets of predictions with a bound on the error rate, i.e., in a classification context, the probability of excluding the correct class label is lower than a predefined significance level. An investigation of the use of decision trees within the conformal prediction framework is presented, with the overall purpose to determine the effect of different algorithmic choices, including split criterion, pruning scheme and way to calculate the probability estimates. Since the error rate is bounded by the framework, the most important property of conformal predictors is efficiency, which concerns minimizing the number of elements in the output prediction sets. Results from one of the largest empirical investigations to date within the conformal prediction framework are presented, showing that in order to optimize efficiency, the decision trees should be induced using no pruning and with smoothed probability estimates. The choice of split criterion to use for the actual induction of the trees did not turn out to have any major impact on the efficiency. Finally, the experimentation also showed that when using decision trees, standard inductive conformal prediction was as efficient as the recently suggested method cross-conformal prediction. This is an encouraging results since cross-conformal prediction uses several decision trees, thus sacrificing the interpretability of a single decision tree.

  • 7.
    Johansson, Ulf
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Boström, Henrik
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Conformal Prediction Using Decision Trees2013Conference paper (Refereed)
    Abstract [en]

    Conformal prediction is a relatively new framework in which the predictive models output sets of predictions with a bound on the error rate, i.e., in a classification context, the probability of excluding the correct class label is lower than a predefined significance level. An investigation of the use of decision trees within the conformal prediction framework is presented, with the overall purpose to determine the effect of different algorithmic choices, including split criterion, pruning scheme and way to calculate the probability estimates. Since the error rate is bounded by the framework, the most important property of conformal predictors is efficiency, which concerns minimizing the number of elements in the output prediction sets. Results from one of the largest empirical investigations to date within the conformal prediction framework are presented, showing that in order to optimize efficiency, the decision trees should be induced using no pruning and with smoothed probability estimates. The choice of split criterion to use for the actual induction of the trees did not turn out to have any major impact on the efficiency. Finally, the experimentation also showed that when using decision trees, standard inductive conformal prediction was as efficient as the recently suggested method cross-conformal prediction. This is an encouraging results since cross-conformal prediction uses several decision trees, thus sacrificing the interpretability of a single decision tree.

  • 8.
    Johansson, Ulf
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Boström, Henrik
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Linusson, Henrik
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Regression conformal prediction with random forests2014In: Machine Learning, ISSN 0885-6125, E-ISSN 1573-0565, Vol. 97, no 1-2, p. 155-176Article in journal (Refereed)
    Abstract [en]

    Regression conformal prediction produces prediction intervals that are valid, i.e., the probability of excluding the correct target value is bounded by a predefined confidence level. The most important criterion when comparing conformal regressors is efficiency; the prediction intervals should be as tight (informative) as possible. In this study, the use of random forests as the underlying model for regression conformal prediction is investigated and compared to existing state-of-the-art techniques, which are based on neural networks and k-nearest neighbors. In addition to their robust predictive performance, random forests allow for determining the size of the prediction intervals by using out-of-bag estimates instead of requiring a separate calibration set. An extensive empirical investigation, using 33 publicly available data sets, was undertaken to compare the use of random forests to existing stateof- the-art conformal predictors. The results show that the suggested approach, on almost all confidence levels and using both standard and normalized nonconformity functions, produced significantly more efficient conformal predictors than the existing alternatives.

  • 9.
    Johansson, Ulf
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    König, Rikard
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Linusson, Henrik
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Boström, Henrik
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Rule Extraction with Guaranteed Fidelity2014Conference paper (Refereed)
    Abstract [en]

    This paper extends the conformal prediction framework to rule extraction, making it possible to extract interpretable models from opaque models in a setting where either the infidelity or the error rate is bounded by a predefined significance level. Experimental results on 27 publicly available data sets show that all three setups evaluated produced valid and rather efficient conformal predictors. The implication is that augmenting rule extraction with conformal prediction allows extraction of models where test set errors or test sets infidelities are guaranteed to be lower than a chosen acceptable level. Clearly this is beneficial for both typical rule extraction scenarios, i.e., either when the purpose is to explain an existing opaque model, or when it is to build a predictive model that must be interpretable.

  • 10.
    Johansson, Ulf
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    König, Rikard
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Boström, Henrik
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Evolved Decision Trees as Conformal Predictors2013Conference paper (Refereed)
    Abstract [en]

    In conformal prediction, predictive models output sets of predictions with a bound on the error rate. In classification, this translates to that the probability of excluding the correct class is lower than a predefined significance level, in the long run. Since the error rate is guaranteed, the most important criterion for conformal predictors is efficiency. Efficient conformal predictors minimize the number of elements in the output prediction sets, thus producing more informative predictions. This paper presents one of the first comprehensive studies where evolutionary algorithms are used to build conformal predictors. More specifically, decision trees evolved using genetic programming are evaluated as conformal predictors. In the experiments, the evolved trees are compared to decision trees induced using standard machine learning techniques on 33 publicly available benchmark data sets, with regard to predictive performance and efficiency. The results show that the evolved trees are generally more accurate, and the corresponding conformal predictors more efficient, than their induced counterparts. One important result is that the probability estimates of decision trees when used as conformal predictors should be smoothed, here using the Laplace correction. Finally, using the more discriminating Brier score instead of accuracy as the optimization criterion produced the most efficient conformal predictions.

  • 11.
    Johansson, Ulf
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    König, Rikard
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Niklasson, Lars
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Increasing Rule Extraction Accuracy by Post-processing GP Trees2008In: Proceedings of the Congress on Evolutionary Computation, IEEE, 2008, p. 3010-3015Conference paper (Refereed)
    Abstract [en]

    Genetic programming (GP), is a very general and efficient technique, often capable of outperforming more specialized techniques on a variety of tasks. In this paper, we suggest a straightforward novel algorithm for post-processing of GP classification trees. The algorithm iteratively, one node at a time, searches for possible modifications that would result in higher accuracy. More specifically, the algorithm for each split evaluates every possible constant value and chooses the best. With this design, the post-processing algorithm can only increase training accuracy, never decrease it. In this study, we apply the suggested algorithm to GP trees, extracted from neural network ensembles. Experimentation, using 22 UCI datasets, shows that the post-processing results in higher test set accuracies on a large majority of datasets. As a matter of fact, for two setups of three evaluated, the increase in accuracy is statistically significant.

  • 12.
    Johansson, Ulf
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    König, Rikard
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Niklasson, Lars
    Using Imaginary Ensembles to Select GP Classifiers2010In: Genetic Programming: 13th European Conference, EuroGP 2010, Istanbul, Turkey, April 7-9, 2010, Proceedings / [ed] A.I. et al. Esparcia-Alcazar, Springer, 2010, p. 278-288Conference paper (Refereed)
    Abstract [en]

    When predictive modeling requires comprehensible models, most data miners will use specialized techniques producing rule sets or decision trees. This study, however, shows that genetically evolved decision trees may very well outperform the more specialized techniques. The proposed approach evolves a number of decision trees and then uses one of several suggested selection strategies to pick one specific tree from that pool. The inherent inconsistency of evolution makes it possible to evolve each tree using all data, and still obtain somewhat different models. The main idea is to use these quite accurate and slightly diverse trees to form an imaginary ensemble, which is then used as a guide when selecting one specific tree. Simply put, the tree classifying the largest number of instances identically to the ensemble is chosen. In the experimentation, using 25 UCI data sets, two selection strategies obtained significantly higher accuracy than the standard rule inducer J48.

  • 13.
    Johansson, Ulf
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    König, Rikard
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Sönströd, Cecilia
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Niklasson, Lars
    Post-processing Evolved Decision Trees2009In: Foundations of Computational Intelligence / [ed] Ajith Abraham, Springer, 2009, p. 149-164Chapter in book (Other academic)
    Abstract [en]

    Although Genetic Programming (GP) is a very general technique, it is also quite powerful. As a matter of fact, GP has often been shown to outperform more specialized techniques on a variety of tasks. In data mining, GP has successfully been applied to most major tasks; e.g. classification, regression and clustering. In this chapter, we introduce, describe and evaluate a straightforward novel algorithm for post-processing genetically evolved decision trees. The algorithm works by iteratively, one node at a time, search for possible modifications that will result in higher accuracy. More specifically, the algorithm, for each interior test, evaluates every possible split for the current attribute and chooses the best. With this design, the post-processing algorithm can only increase training accuracy, never decrease it. In the experiments, the suggested algorithm is applied to GP decision trees, either induced directly from datasets, or extracted from neural network ensembles. The experimentation, using 22 UCI datasets, shows that the suggested post-processing technique results in higher test set accuracies on a large majority of the datasets. As a matter of fact, the increase in test accuracy is statistically significant for one of the four evaluated setups, and substantial on two out of the other three.

  • 14.
    Johansson, Ulf
    et al.
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL). Department of Information Technology, University of Borås, Sweden.
    Linusson, H.
    Department of Information Technology, University of Borås, Sweden.
    Löfström, Tuwe
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL). Department of Information Technology, University of Borås, Sweden.
    Boström, H.
    Department of Computer and Systems Sciences, Stockholm University, Sweden.
    Model-agnostic nonconformity functions for conformal classification2017In: Proceedings of the International Joint Conference on Neural Networks, IEEE, 2017, p. 2072-2079Conference paper (Refereed)
    Abstract [en]

    A conformai predictor outputs prediction regions, for classification label sets. The key property of all conformai predictors is that they are valid, i.e., their error rate on novel data is bounded by a preset significance level. Thus, the key performance metric for evaluating conformal predictors is the size of the output prediction regions, where smaller (more informative) prediction regions are said to be more efficient. All conformal predictions rely on nonconformity functions, measuring the strangeness of an input-output pair, and the efficiency depends critically on the quality of the chosen nonconformity function. In this paper, three model-agnostic nonconformity functions, based on well-known loss functions, are evaluated with regard to how they affect efficiency. In the experimentation on 21 publicly available multi-class data sets, both single neural networks and ensembles of neural networks are used as underlying models for conformal classifiers. The results show that the choice of nonconformity function has a major impact on the efficiency, but also that different nonconformity functions should be used depending on the exact efficiency metric. For a high fraction of single-label predictions, a margin-based nonconformity function is the best option, while a nonconformity function based on the hinge loss obtained the smallest label sets on average.

  • 15.
    Johansson, Ulf
    et al.
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL). Department of Information Technology, University of Borås, Sweden.
    Linusson, Henrik
    Department of Information Technology, University of Borås, Sweden.
    Löfström, Tuwe
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL). Department of Information Technology, University of Borås, Sweden.
    Boström, Henrik
    School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Sweden.
    Interpretable regression trees using conformal prediction2018In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 97, p. 394-404Article in journal (Refereed)
    Abstract [en]

    A key property of conformal predictors is that they are valid, i.e., their error rate on novel data is bounded by a preset level of confidence. For regression, this is achieved by turning the point predictions of the underlying model into prediction intervals. Thus, the most important performance metric for evaluating conformal regressors is not the error rate, but the size of the prediction intervals, where models generating smaller (more informative) intervals are said to be more efficient. State-of-the-art conformal regressors typically utilize two separate predictive models: the underlying model providing the center point of each prediction interval, and a normalization model used to scale each prediction interval according to the estimated level of difficulty for each test instance. When using a regression tree as the underlying model, this approach may cause test instances falling into a specific leaf to receive different prediction intervals. This clearly deteriorates the interpretability of a conformal regression tree compared to a standard regression tree, since the path from the root to a leaf can no longer be translated into a rule explaining all predictions in that leaf. In fact, the model cannot even be interpreted on its own, i.e., without reference to the corresponding normalization model. Current practice effectively presents two options for constructing conformal regression trees: to employ a (global) normalization model, and thereby sacrifice interpretability; or to avoid normalization, and thereby sacrifice both efficiency and individualized predictions. In this paper, two additional approaches are considered, both employing local normalization: the first approach estimates the difficulty by the standard deviation of the target values in each leaf, while the second approach employs Mondrian conformal prediction, which results in regression trees where each rule (path from root node to leaf node) is independently valid. An empirical evaluation shows that the first approach is as efficient as current state-of-the-art approaches, thus eliminating the efficiency vs. interpretability trade-off present in existing methods. Moreover, it is shown that if a validity guarantee is required for each single rule, as provided by the Mondrian approach, a penalty with respect to efficiency has to be paid, but it is only substantial at very high confidence levels.

  • 16.
    Johansson, Ulf
    et al.
    University of Borås, Sweden.
    Löfström, Tuve
    University of Borås, Sweden.
    Producing implicit diversity in ANN ensembles2012In: The 2012 International Joint Conference on Neural Networks (IJCNN), 2012, p. 1-8Conference paper (Refereed)
    Abstract [en]

    Combining several ANNs into ensembles normally results in a very accurate and robust predictive models. Many ANN ensemble techniques are, however, quite complicated and often explicitly optimize some diversity metric. Unfortunately, the lack of solid validation of the explicit algorithms, at least for classification, makes the use of diversity measures as part of an optimization function questionable. The merits of implicit methods, most notably bagging, are on the other hand experimentally established and well-known. This paper evaluates a number of straightforward techniques for introducing implicit diversity in ANN ensembles, including a novel technique producing diversity by using ANNs with different and slightly randomized link structures. The experimental results, comparing altogether 54 setups and two different ensemble sizes on 30 UCI data sets, show that all methods succeeded in producing implicit diversity, but that the effect on ensemble accuracy varied. Still, most setups evaluated did result in more accurate ensembles, compared to the baseline setup, especially for the larger ensemble size. As a matter of fact, several setups even obtained significantly higher ensemble accuracy than bagging. The analysis also identified that diversity was, relatively speaking, more important for the larger ensembles. Looking specifically at the methods used to increase the implicit diversity, setups using the technique that utilizes the randomized link structures generally produced the most accurate ensembles.

  • 17.
    Johansson, Ulf
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Producing Implicit Diversity in ANN Ensembles2012Conference paper (Refereed)
    Abstract [en]

    Combining several ANNs into ensembles normally results in a very accurate and robust predictive models. Many ANN ensemble techniques are, however, quite complicated and often explicitly optimize some diversity metric. Unfortunately, the lack of solid validation of the explicit algorithms, at least for classification, makes the use of diversity measures as part of an optimization function questionable. The merits of implicit methods, most notably bagging, are on the other hand experimentally established and well-known. This paper evaluates a number of straightforward techniques for introducing implicit diversity in ANN ensembles, including a novel technique producing diversity by using ANNs with different and slightly randomized link structures. The experimental results, comparing altogether 54 setups and two different ensemble sizes on 30 UCI data sets, show that all methods succeeded in producing implicit diversity, but that the effect on ensemble accuracy varied. Still, most setups evaluated did result in more accurate ensembles, compared to the baseline setup, especially for the larger ensemble size. As a matter of fact, several setups even obtained significantly higher ensemble accuracy than bagging. The analysis also identified that diversity was, relatively speaking, more important for the larger ensembles. Looking specifically at the methods used to increase the implicit diversity, setups using the technique that utilizes the randomized link structures generally produced the most accurate ensembles.

  • 18.
    Johansson, Ulf
    et al.
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL).
    Löfström, Tuve
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL).
    Boström, Henrik
    School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Sweden.
    Calibrating probability estimation trees using Venn-Abers predictors2019In: SIAM International Conference on Data Mining, SDM 2019, Society for Industrial and Applied Mathematics, 2019, p. 28-36Conference paper (Refereed)
    Abstract [en]

    Class labels output by standard decision trees are not very useful for making informed decisions, e.g., when comparing the expected utility of various alternatives. In contrast, probability estimation trees (PETs) output class probability distributions rather than single class labels. It is well known that estimating class probabilities in PETs by relative frequencies often lead to extreme probability estimates, and a number of approaches to provide more well-calibrated estimates have been proposed. In this study, a recent model-agnostic calibration approach, called Venn-Abers predictors is, for the first time, considered in the context of decision trees. Results from a large-scale empirical investigation are presented, comparing the novel approach to previous calibration techniques with respect to several different performance metrics, targeting both predictive performance and reliability of the estimates. All approaches are considered both with and without Laplace correction. The results show that using Venn-Abers predictors for calibration is a highly competitive approach, significantly outperforming Platt scaling, Isotonic regression and no calibration, with respect to almost all performance metrics used, independently of whether Laplace correction is applied or not. The only exception is AUC, where using non-calibrated PETs together with Laplace correction, actually is the best option, which can be explained by the fact that AUC is not affected by the absolute, but only relative, values of the probability estimates. 

  • 19.
    Johansson, Ulf
    et al.
    University of Borås, Sweden.
    Löfström, Tuve
    University of Borås, Sweden.
    Boström, Henrik
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Overproduce-and-Select: The Grim Reality2013In: 2013 IEEE Symposium on Computational Intelligence and Ensemble Learning (CIEL), IEEE conference proceedings, 2013, p. 52-59Conference paper (Refereed)
    Abstract [en]

    Overproduce-and-select (OPAS) is a frequently used paradigm for building ensembles. In static OPAS, a large number of base classifiers are trained, before a subset of the available models is selected to be combined into the final ensemble. In general, the selected classifiers are supposed to be accurate and diverse for the OPAS strategy to result in highly accurate ensembles, but exactly how this is enforced in the selection process is not obvious. Most often, either individual models or ensembles are evaluated, using some performance metric, on available and labeled data. Naturally, the underlying assumption is that an observed advantage for the models (or the resulting ensemble) will carry over to test data. In the experimental study, a typical static OPAS scenario, using a pool of artificial neural networks and a number of very natural and frequently used performance measures, is evaluated on 22 publicly available data sets. The discouraging result is that although a fairly large proportion of the ensembles obtained higher test set accuracies, compared to using the entire pool as the ensemble, none of the selection criteria could be used to identify these highly accurate ensembles. Despite only investigating a specific scenario, we argue that the settings used are typical for static OPAS, thus making the results general enough to question the entire paradigm.

  • 20.
    Johansson, Ulf
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Boström, Henrik
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Overproduce-and-Select: The Grim Reality2013Conference paper (Refereed)
    Abstract [en]

    Overproduce-and-select (OPAS) is a frequently used paradigm for building ensembles. In static OPAS, a large number of base classifiers are trained, before a subset of the available models is selected to be combined into the final ensemble. In general, the selected classifiers are supposed to be accurate and diverse for the OPAS strategy to result in highly accurate ensembles, but exactly how this is enforced in the selection process is not obvious. Most often, either individual models or ensembles are evaluated, using some performance metric, on available and labeled data. Naturally, the underlying assumption is that an observed advantage for the models (or the resulting ensemble) will carry over to test data. In the experimental study, a typical static OPAS scenario, using a pool of artificial neural networks and a number of very natural and frequently used performance measures, is evaluated on 22 publicly available data sets. The discouraging result is that although a fairly large proportion of the ensembles obtained higher test set accuracies, compared to using the entire pool as the ensemble, none of the selection criteria could be used to identify these highly accurate ensembles. Despite only investigating a specific scenario, we argue that the settings used are typical for static OPAS, thus making the results general enough to question the entire paradigm.

  • 21.
    Johansson, Ulf
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Boström, Henrik
    Stockholm University, Sweden.
    Random Brains2013Conference paper (Refereed)
    Abstract [en]

    In this paper, we introduce and evaluate a novel method, called random brains, for producing neural network ensembles. The suggested method, which is heavily inspired by the random forest technique, produces diversity implicitly by using bootstrap training and randomized architectures. More specifically, for each base classifier multilayer perceptron, a number of randomly selected links between the input layer and the hidden layer are removed prior to training, thus resulting in potentially weaker but more diverse base classifiers. The experimental results on 20 UCI data sets show that random brains obtained significantly higher accuracy and AUC, compared to standard bagging of similar neural networks not utilizing randomized architectures. The analysis shows that the main reason for the increased ensemble performance is the ability to produce effective diversity, as indicated by the increase in the difficulty diversity measure.

  • 22.
    Johansson, Ulf
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Boström, Henrik
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Random Brains2013Conference paper (Refereed)
    Abstract [en]

    In this paper, we introduce and evaluate a novel method, called random brains, for producing neural network ensembles. The suggested method, which is heavily inspired by the random forest technique, produces diversity implicitly by using bootstrap training and randomized architectures. More specifically, for each base classifier multilayer perceptron, a number of randomly selected links between the input layer and the hidden layer are removed prior to training, thus resulting in potentially weaker but more diverse base classifiers. The experimental results on 20 UCI data sets show that random brains obtained significantly higher accuracy and AUC, compared to standard bagging of similar neural networks not utilizing randomized architectures. The analysis shows that the main reason for the increased ensemble performance is the ability to produce effective diversity, as indicated by the increase in the difficulty diversity measure.

  • 23.
    Johansson, Ulf
    et al.
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL).
    Löfström, Tuve
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL).
    Linusson, Henrik
    Högskolan i Borås, Department of Information Technology, Borås, Sweden.
    Boström, Henrik
    The Royal Institute of Technology (KTH), School of Electrical Engineering and Computer Science, Stockholm, Sweden.
    Efficient Venn Predictors using Random Forests2019In: Machine Learning, ISSN 0885-6125, E-ISSN 1573-0565, Vol. 108, no 3, p. 535-550Article in journal (Refereed)
    Abstract [en]

    Successful use of probabilistic classification requires well-calibrated probability estimates, i.e., the predicted class probabilities must correspond to the true probabilities. In addition, a probabilistic classifier must, of course, also be as accurate as possible. In this paper, Venn predictors, and its special case Venn-Abers predictors, are evaluated for probabilistic classification, using random forests as the underlying models. Venn predictors output multiple probabilities for each label, i.e., the predicted label is associated with a probability interval. Since all Venn predictors are valid in the long run, the size of the probability intervals is very important, with tighter intervals being more informative. The standard solution when calibrating a classifier is to employ an additional step, transforming the outputs from a classifier into probability estimates, using a labeled data set not employed for training of the models. For random forests, and other bagged ensembles, it is, however, possible to use the out-of-bag instances for calibration, making all training data available for both model learning and calibration. This procedure has previously been successfully applied to conformal prediction, but was here evaluated for the first time for Venn predictors. The empirical investigation, using 22 publicly available data sets, showed that all four versions of the Venn predictors were better calibrated than both the raw estimates from the random forest, and the standard techniques Platt scaling and isotonic regression. Regarding both informativeness and accuracy, the standard Venn predictor calibrated on out-of-bag instances was the best setup evaluated. Most importantly, calibrating on out-of-bag instances, instead of using a separate calibration set, resulted in tighter intervals and more accurate models on every data set, for both the Venn predictors and the Venn-Abers predictors.

  • 24.
    Johansson, Ulf
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Niklasson, Lars
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Empirically Investigating the Importance of Diversity2007Conference paper (Refereed)
  • 25.
    Johansson, Ulf
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Niklasson, Lars
    Evaluating Standard Techniques for Implicit Diversity2008In: Advances in Knowledge Discovery and Data Mining, Springer, 2008, p. 613-622Conference paper (Refereed)
  • 26.
    Johansson, Ulf
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Niklasson, Lars
    The Importance of Diversity in Neural Network Ensembles: An Empirical Investigation2007Conference paper (Refereed)
    Abstract [en]

    When designing ensembles, it is almost an axiom that the base classifiers must be diverse in order for the ensemble to generalize well. Unfortunately, there is no clear definition of the key term diversity, leading to several diversity measures and many, more or less ad hoc, methods for diversity creation in ensembles. In addition, no specific diversity measure has shown to have a high correlation with test set accuracy. The purpose of this paper is to empirically evaluate ten different diversity measures, using neural network ensembles and 11 publicly available data sets. The main result is that all diversity measures evaluated, in this study too, show low or very low correlation with test set accuracy. Having said that, two measures; double fault and difficulty show slightly higher correlations compared to the other measures. The study furthermore shows that the correlation between accuracy measured on training or validation data and test set accuracy also is rather low. These results challenge ensemble design techniques where diversity is explicitly maximized or where ensemble accuracy on a hold-out set is used for optimization.

  • 27.
    Johansson, Ulf
    et al.
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL). Department of Information Technology, University of Borås, Sweden.
    Löfström, Tuve
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL). Department of Information Technology, University of Borås, Sweden.
    Sundell, Håkan
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL). Department of Information Technology, University of Borås, Sweden.
    Venn predictors using lazy learners2018In: Proceedings of the 2018 International Conference on Data Science, ICDATA'18 / [ed] R. Stahlbock, G. M. Weiss & M. Abou-Nasr, CSREA Press, 2018, p. 220-226Conference paper (Refereed)
    Abstract [en]

    Probabilistic classification requires well-calibrated probability estimates, i.e., the predicted class probabilities must correspond to the true probabilities. Venn predictors, which can be used on top of any classifier, are automatically valid multiprobability predictors, making them extremely suitable for probabilistic classification. A Venn predictor outputs multiple probabilities for each label, so the predicted label is associated with a probability interval. While all Venn predictors are valid, their accuracy and the size of the probability interval are dependent on both the underlying model and some interior design choices. Specifically, all Venn predictors use so called Venn taxonomies for dividing the instances into a number of categories, each such taxonomy defining a different Venn predictor. A frequently used, but very basic taxonomy, is to categorize the instances based on their predicted label. In this paper, we investigate some more finegrained taxonomies, that use not only the predicted label but also some measures related to the confidence in individual predictions. The empirical investigation, using 22 publicly available data sets and lazy learners (kNN) as the underlying models, showed that the probability estimates from the Venn predictors, as expected, were extremely well-calibrated. Most importantly, using the basic (i.e., label-based) taxonomy produced significantly more accurate and informative Venn predictors compared to the more complex alternatives. In addition, the results also showed that when using lazy learners as underlying models, a transductive approach significantly outperformed an inductive, with regard to accuracy and informativeness. This result is in contrast to previous studies, where other underlying models were used.

  • 28.
    Johansson, Ulf
    et al.
    University of Borås, Faculty of Librarianship, Information, Education and IT.
    Löfström, Tuve
    University of Borås, Faculty of Librarianship, Information, Education and IT.
    Sundell, Håkan
    University of Borås, Faculty of Librarianship, Information, Education and IT.
    Linnusson, Henrik
    University of Borås, Faculty of Librarianship, Information, Education and IT.
    Gidenstam, Anders
    University of Borås, Faculty of Librarianship, Information, Education and IT.
    Boström, Henrik
    School of Information and Communication Technology, Royal Institute of Technology, Sweden.
    Venn predictors for well-calibrated probability estimation trees2018In: 7th Symposium on Conformal and Probabilistic Prediction and Applications: COPA 2018, 11-13 June 2018, Maastricht, The Netherlands / [ed] Alex J. Gammerman and Vladimir Vovk and Zhiyuan Luo and Evgueni N. Smirnov and Ralf L. M. Peeter, 2018, p. 3-14Conference paper (Refereed)
    Abstract [en]

    Successful use of probabilistic classification requires well-calibrated probability estimates, i.e., the predicted class probabilities must correspond to the true probabilities. The standard solution is to employ an additional step, transforming the outputs from a classifier into probability estimates. In this paper, Venn predictors are compared to Platt scaling and isotonic regression, for the purpose of producing well-calibrated probabilistic predictions from decision trees. The empirical investigation, using 22 publicly available datasets, showed that the probability estimates from the Venn predictor were extremely well-calibrated. In fact, in a direct comparison using the accepted reliability metric, the Venn predictor estimates were the most exact on every data set.

  • 29.
    Johansson, Ulf
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Sönströd, Cecilia
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Locally Induced Predictive Models2011Conference paper (Refereed)
    Abstract [en]

    Most predictive modeling techniques utilize all available data to build global models. This is despite the wellknown fact that for many problems, the targeted relationship varies greatly over the input space, thus suggesting that localized models may improve predictive performance. In this paper, we suggest and evaluate a technique inducing one predictive model for each test instance, using only neighboring instances. In the experimentation, several different variations of the suggested algorithm producing localized decision trees and neural network models are evaluated on 30 UCI data sets. The main result is that the suggested approach generally yields better predictive performance than global models built using all available training data. As a matter of fact, all techniques producing J48 trees obtained significantly higher accuracy and AUC, compared to the global J48 model. For RBF network models, with their inherent ability to use localized information, the suggested approach was only successful with regard to accuracy, while global RBF models had a better ranking ability, as seen by their generally higher AUCs.

  • 30.
    Johansson, Ulf
    et al.
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL).
    Löfström, Tuwe
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL).
    Sundell, Håkan
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL).
    Linusson, Henrik
    Department of Information Technology, University of Borås, Sweden.
    Gidenstam, Anders
    Department of Information Technology, University of Borås, Sweden.
    Boström, Henrik
    School of Information and Communication Technology, Royal Institute of Technology, Sweden.
    Venn predictors for well-calibrated probability estimation trees2018In: Conformal and Probabilistic Prediction and Applications / [ed] A. Gammerman, V. Vovk, Z. Luo, E. Smirnov, & R. Peeters, 2018, p. 3-14Conference paper (Refereed)
    Abstract [en]

    Successful use of probabilistic classification requires well-calibrated probability estimates, i.e., the predicted class probabilities must correspond to the true probabilities. The standard solution is to employ an additional step, transforming the outputs from a classifier into probability estimates. In this paper, Venn predictors are compared to Platt scaling and isotonic regression, for the purpose of producing well-calibrated probabilistic predictions from decision trees. The empirical investigation, using 22 publicly available data sets, showed that the probability estimates from the Venn predictor were extremely well-calibrated. In fact, in a direct comparison using the accepted reliability metric, the Venn predictor estimates were the most exact on every data set.

  • 31.
    Johansson, Ulf
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Sönströd, Cecilia
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Boström, Henrik
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Chipper: A Novel Algorithm for Concept Description2008Conference paper (Refereed)
    Abstract [en]

    In this paper, several demands placed on concept description algorithms are identified and discussed. The most important criterion is the ability to produce compact rule sets that, in a natural and accurate way, describe the most important relationships in the underlying domain. An algorithm based on the identified criteria is presented and evaluated. The algorithm, named Chipper, produces decision lists, where each rule covers a maximum number of remaining instances while meeting requested accuracy requirements. In the experiments, Chipper is evaluated on nine UCI data sets. The main result is that Chipper produces compact and understandable rule sets, clearly fulfilling the overall goal of concept description. In the experiments, Chipper's accuracy is similar to standard decision tree and rule induction algorithms, while rule sets have superior comprehensibility.

  • 32.
    Johansson, Ulf
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Sönströd, Cecilia
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    One Tree to Explain Them All2011Conference paper (Refereed)
    Abstract [en]

    Random forest is an often used ensemble technique, renowned for its high predictive performance. Random forests models are, however, due to their sheer complexity inherently opaque, making human interpretation and analysis impossible. This paper presents a method of approximating the random forest with just one decision tree. The approach uses oracle coaching, a recently suggested technique where a weaker but transparent model is generated using combinations of regular training data and test data initially labeled by a strong classifier, called the oracle. In this study, the random forest plays the part of the oracle, while the transparent models are decision trees generated by either the standard tree inducer J48, or by evolving genetic programs. Evaluation on 30 data sets from the UCI repository shows that oracle coaching significantly improves both accuracy and area under ROC curve, compared to using training data only. As a matter of fact, resulting single tree models are as accurate as the random forest, on the specific test instances. Most importantly, this is not achieved by inducing or evolving huge trees having perfect fidelity; a large majority of all trees are instead rather compact and clearly comprehensible. The experiments also show that the evolution outperformed J48, with regard to accuracy, but that this came at the expense of slightly larger trees.

  • 33.
    Johansson, Ulf
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Sönströd, Cecilia
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Oracle Coached Decision Trees and Lists2010Conference paper (Refereed)
    Abstract [en]

    This paper introduces a novel method for obtaining increased predictive performance from transparent models in situations where production input vectors are available when building the model. First, labeled training data is used to build a powerful opaque model, called an oracle. Second, the oracle is applied to production instances, generating predicted target values, which are used as labels. Finally, these newly labeled instances are utilized, in different combinations with normal training data, when inducing a transparent model. Experimental results, on 26 UCI data sets, show that the use of oracle coaches significantly improves predictive performance, compared to standard model induction. Most importantly, both accuracy and AUC results are robust over all combinations of opaque and transparent models evaluated. This study thus implies that the straightforward procedure of using a coaching oracle, which can be used with arbitrary classifiers, yields significantly better predictive performance at a low computational cost.

  • 34.
    Johansson, Ulf
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Sönströd, Cecilia
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Boström, Henrik
    Obtaining accurate and comprehensible classifiers using oracle coaching2012In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. Volume 16, no Number 2, p. 247-263Article in journal (Refereed)
    Abstract [en]

    While ensemble classifiers often reach high levels of predictive performance, the resulting models are opaque and hence do not allow direct interpretation. When employing methods that do generate transparent models, predictive performance typically has to be sacrificed. This paper presents a method of improving predictive performance of transparent models in the very common situation where instances to be classified, i.e., the production data, are known at the time of model building. This approach, named oracle coaching, employs a strong classifier, called an oracle, to guide the generation of a weaker, but transparent model. This is accomplished by using the oracle to predict class labels for the production data, and then applying the weaker method on this data, possibly in conjunction with the original training set. Evaluation on 30 data sets from the UCI repository shows that oracle coaching significantly improves predictive performance, measured by both accuracy and area under ROC curve, compared to using training data only. This result is shown to be robust for a variety of methods for generating the oracles and transparent models. More specifically, random forests and bagged radial basis function networks are used as oracles, while J48 and JRip are used for generating transparent models. The evaluation further shows that significantly better results are obtained when using the oracle-classified production data together with the original training data, instead of using only oracle data. An analysis of the fidelity of the transparent models to the oracles shows that performance gains can be expected from increasing oracle performance rather than from increasing fidelity. Finally, it is shown that further performance gains can be achieved by adjusting the relative weights of training data and oracle data.

  • 35.
    Johansson, Ulf
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Sönströd, Cecilia
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    König, Rikard
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Using Genetic Programming to Obtain Implicit Diversity2009Conference paper (Refereed)
    Abstract [en]

    When performing predictive data mining, the use of ensembles is known to increase prediction accuracy, compared to single models. To obtain this higher accuracy, ensembles should be built from base classifiers that are both accurate and diverse. The question of how to balance these two properties in order to maximize ensemble accuracy is, however, far from solved and many different techniques for obtaining ensemble diversity exist. One such technique is bagging, where implicit diversity is introduced by training base classifiers on different subsets of available data instances, thus resulting in less accurate, but diverse base classifiers. In this paper, genetic programming is used as an alternative method to obtain implicit diversity in ensembles by evolving accurate, but different base classifiers in the form of decision trees, thus exploiting the inherent inconsistency of genetic programming. The experiments show that the GP approach outperforms standard bagging of decision trees, obtaining significantly higher ensemble accuracy over 25 UCI datasets. This superior performance stems from base classifiers having both higher average accuracy and more diversity. Implicitly introducing diversity using GP thus works very well, since evolved base classifiers tend to be highly accurate and diverse.

  • 36.
    Johansson, Ulf
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Sönströd, Cecilia
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Norinder, Ulf
    Boström, Henrik
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Using Feature Selection with Bagging and Rule Extraction in Drug Discovery2010Conference paper (Refereed)
    Abstract [en]

    This paper investigates different ways of combining feature selection with bagging and rule extraction in predictive modeling. Experiments on a large number of data sets from the medicinal chemistry domain, using standard algorithms implemented in theWeka data mining workbench, show that feature selection can lead to significantly improved predictive performance.When combining feature selection with bagging, employing the feature selection on each bootstrap obtains the best result.When using decision trees for rule extraction, the effect of feature selection can actually be detrimental, unless the transductive approach oracle coaching is also used. However, employing oracle coaching will lead to significantly improved performance, and the best results are obtainedwhen performing feature selection before training the opaque model. The overall conclusion is that it can make a substantial difference for the predictive performance exactly how feature selection is used in conjunction with other techniques.

  • 37.
    König, Rikard
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Johansson, Ulf
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Niklasson, Lars
    Improving GP Classification Performance by Injection of Decision Trees2010Conference paper (Refereed)
    Abstract [en]

    This paper presents a novel hybrid method combining genetic programming and decision tree learning. The method starts by estimating a benchmark level of reasonable accuracy, based on decision tree performance on bootstrap samples of the training set. Next, a normal GP evolution is started with the aim of producing an accurate GP. At even intervals, the best GP in the population is evaluated against the accuracy benchmark. If the GP has higher accuracy than the benchmark, the evolution continues normally until the maximum number of generations is reached. If the accuracy is lower than the benchmark, two things happen. First, the fitness function is modified to allow larger GPs, able to represent more complex models. Secondly, a decision tree with increased size and trained on a bootstrap of the training data is injected into the population. The experiments show that the hybrid solution of injecting decision trees into a GP population gives synergetic effects producing results that are better than using either technique separately. The results, from 18 UCI data sets, show that the proposed method clearly outperforms normal GP, and is significantly better than the standard decision tree algorithm.

  • 38.
    Linusson, Henrik
    et al.
    Department of Information Technology, University of Borås, Borås, Sweden.
    Johansson, Ulf
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL).
    Boström, Henrik
    School of Electrical Engineering and Computer Science, Royal Institute of Technology, Kista, Sweden.
    Löfström, Tuve
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL).
    Classification with reject option using conformal prediction2018In: Advances in Knowledge Discovery and Data Mining: 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018, Proceedings, Part I, Springer, 2018, p. 94-105Conference paper (Refereed)
    Abstract [en]

    In this paper, we propose a practically useful means of interpreting the predictions produced by a conformal classifier. The proposed interpretation leads to a classifier with a reject option, that allows the user to limit the number of erroneous predictions made on the test set, without any need to reveal the true labels of the test objects. The method described in this paper works by estimating the cumulative error count on a set of predictions provided by a conformal classifier, ordered by their confidence. Given a test set and a user-specified parameter k, the proposed classification procedure outputs the largest possible amount of predictions containing on average at most k errors, while refusing to make predictions for test objects where it is too uncertain. We conduct an empirical evaluation using benchmark datasets, and show that we are able to provide accurate estimates for the error rate on the test set. 

  • 39.
    Linusson, Henrik
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Johansson, Ulf
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Boström, Henrik
    Dept. of Computer and Systems Sciences Stockholm University, Kista, Sweden.
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Efficiency Comparison of Unstable Transductive and Inductive Conformal Classifiers2014Conference paper (Refereed)
    Abstract [en]

    In the conformal prediction literature, it appears axiomatic that transductive conformal classifiers possess a higher predictive efficiency than inductive conformal classifiers, however, this depends on whether or not the nonconformity function tends to overfit misclassified test examples. With the conformal prediction framework’s increasing popularity, it thus becomes necessary to clarify the settings in which this claim holds true. In this paper, the efficiency of transductive conformal classifiers based on decision tree, random forest and support vector machine classification models is compared to the efficiency of corresponding inductive conformal classifiers. The results show that the efficiency of conformal classifiers based on standard decision trees or random forests is substantially improved when used in the inductive mode, while conformal classifiers based on support vector machines are more efficient in the transductive mode. In addition, an analysis is presented that discusses the effects of calibration set size on inductive conformal classifier efficiency.

  • 40.
    Linusson, Henrik
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Johansson, Ulf
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Boström, Henrik
    Dept. of Computer and Systems Sciences Stockholm University, Kista, Sweden.
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Efficiency Comparison of Unstable Transductive and Inductive Conformal Classifiers2014Conference paper (Refereed)
    Abstract [en]

    In the conformal prediction literature, it appears axiomatic that transductive conformal classifiers possess a higher predictive efficiency than inductive conformal classifiers, however, this depends on whether or not the nonconformity function tends to overfit misclassified test examples. With the conformal prediction framework’s increasing popularity, it thus becomes necessary to clarify the settings in which this claim holds true. In this paper, the efficiency of transductive conformal classifiers based on decision tree, random forest and support vector machine classification models is compared to the efficiency of corresponding inductive conformal classifiers. The results show that the efficiency of conformal classifiers based on standard decision trees or random forests is substantially improved when used in the inductive mode, while conformal classifiers based on support vector machines are more efficient in the transductive mode. In addition, an analysis is presented that discusses the effects of calibration set size on inductive conformal classifier efficiency.

  • 41.
    Linusson, Henrik
    et al.
    Högskolan i Borås, Akademin för bibliotek, information, pedagogik och IT.
    Johansson, Ulf
    Högskolan i Borås, Akademin för bibliotek, information, pedagogik och IT.
    Boström, Henrik
    Dept. of Computer and Systems Sciences, Stockholm University, Kista, Sweden.
    Löfström, Tuve
    Högskolan i Borås, Akademin för bibliotek, information, pedagogik och IT.
    Reliable Confidence Predictions Using Conformal Prediction2016In: Lecture Notes in Computer Science, 2016, p. 77-88Conference paper (Refereed)
    Abstract [en]

    Conformal classiers output condence prediction regions, i.e., multi-valued predictions that are guaranteed to contain the true output value of each test pattern with some predened probability. In order to fully utilize the predictions provided by a conformal classier, it is essential that those predictions are reliable, i.e., that a user is able to assess the quality of the predictions made. Although conformal classiers are statistically valid by default, the error probability of the prediction regions output are dependent on their size in such a way that smaller, and thus potentially more interesting, predictions are more likely to be incorrect. This paper proposes, and evaluates, a method for producing rened error probability estimates of prediction regions, that takes their size into account. The end result is a binary conformal condence predictor that is able to provide accurate error probability estimates for those prediction regions containing only a single class label.

  • 42.
    Linusson, Henrik
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Johansson, Ulf
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Signed-Error Conformal Regression2014In: Advances in Knowledge Discovery and Data Mining 18th Pacific-Asia Conference, PAKDD 2014 Tainan, Taiwan, May 13-16, 2014 Proceedings, Part I, Springer, 2014, p. 224-236Conference paper (Refereed)
    Abstract [en]

    This paper suggests a modification of the Conformal Prediction framework for regression that will strengthen the associated guarantee of validity. We motivate the need for this modification and argue that our conformal regressors are more closely tied to the actual error distribution of the underlying model, thus allowing for more natural interpretations of the prediction intervals. In the experimentation, we provide an empirical comparison of our conformal regressors to traditional conformal regressors and show that the proposed modification results in more robust two-tailed predictions, and more efficient one-tailed predictions.

  • 43.
    Linusson, Henrik
    et al.
    University of Borås, Faculty of Librarianship, Information, Education and IT.
    Norinder, Ulf
    Swetox, Karolinska Institutet.
    Boström, Henrik
    Dept. of Computer Science and Informatics, Stockholm University.
    Johansson, Ulf
    University of Borås, Faculty of Librarianship, Information, Education and IT.
    Löfström, Tuve
    University of Borås, Faculty of Librarianship, Information, Education and IT.
    On the Calibration of Aggregated Conformal Predictors2017In: Proceedings of Machine Learning Research, 2017Conference paper (Refereed)
    Abstract [en]

    Conformal prediction is a learning framework that produces models that associate witheach of their predictions a measure of statistically valid confidence. These models are typi-cally constructed on top of traditional machine learning algorithms. An important result ofconformal prediction theory is that the models produced are provably valid under relativelyweak assumptions—in particular, their validity is independent of the specific underlyinglearning algorithm on which they are based. Since validity is automatic, much research onconformal predictors has been focused on improving their informational and computationalefficiency. As part of the efforts in constructing efficient conformal predictors, aggregatedconformal predictors were developed, drawing inspiration from the field of classification andregression ensembles. Unlike early definitions of conformal prediction procedures, the va-lidity of aggregated conformal predictors is not fully understood—while it has been shownthat they might attain empirical exact validity under certain circumstances, their theo-retical validity is conditional on additional assumptions that require further clarification.In this paper, we show why validity is not automatic for aggregated conformal predictors,and provide a revised definition of aggregated conformal predictors that gains approximatevalidity conditional on properties of the underlying learning algorithm.

  • 44.
    Linusson, Henrik
    et al.
    Department of Information Technology, University of Borås, Sweden.
    Norinder, Ulf
    Swetox, Karolinska Institutet, Unit of Toxicology Sciences, Sweden.
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS. Department of Computer and Systems Sciences, Stockholm University, Sweden.
    Johansson, Ulf
    Högskolan i Jönköping, JTH, Datateknik och informatik.
    Löfström, Tuve
    Högskolan i Jönköping, JTH. Forskningsmiljö Datavetenskap och informatik.
    On the calibration of aggregated conformal predictors2017In: Proceedings of Machine Learning Research: Volume 60: Conformal and Probabilistic Prediction and Applications, 13-16 June 2017, Stockholm, Sweden / [ed] Alex Gammerman, Vladimir Vovk, Zhiyuan Luo, and Harris Papadopoulos, 2017, p. 154-173Conference paper (Refereed)
    Abstract [en]

    Conformal prediction is a learning framework that produces models that associate with each of their predictions a measure of statistically valid confidence. These models are typically constructed on top of traditional machine learning algorithms. An important result of conformal prediction theory is that the models produced are provably valid under relatively weak assumptions—in particular, their validity is independent of the specific underlying learning algorithm on which they are based. Since validity is automatic, much research on conformal predictors has been focused on improving their informational and computational efficiency. As part of the efforts in constructing efficient conformal predictors, aggregated conformal predictors were developed, drawing inspiration from the field of classification and regression ensembles. Unlike early definitions of conformal prediction procedures, the validity of aggregated conformal predictors is not fully understood—while it has been shown that they might attain empirical exact validity under certain circumstances, their theoretical validity is conditional on additional assumptions that require further clarification. In this paper, we show why validity is not automatic for aggregated conformal predictors, and provide a revised definition of aggregated conformal predictors that gains approximate validity conditional on properties of the underlying learning algorithm.

  • 45.
    Linusson, Henrik
    et al.
    Department of Information Technology, University of Borås, Sweden.
    Norinder, Ulf
    Swetox, Karolinska Institutet, Unit of Toxicology Sciences, Sweden.
    Boström, Henrik
    Department of Computer and Systems Sciences, Stockholm University, Sweden.
    Johansson, Ulf
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL). Department of Information Technology, University of Borås, Sweden.
    Löfström, Tuwe
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL). Department of Information Technology, University of Borås, Sweden.
    On the calibration of aggregated conformal predictors2017In: Proceedings of Machine Learning Research: Volume 60: Conformal and Probabilistic Prediction and Applications, 13-16 June 2017, Stockholm, Sweden / [ed] Alex Gammerman, Vladimir Vovk, Zhiyuan Luo, and Harris Papadopoulos, Machine Learning Research , 2017, p. 154-173Conference paper (Refereed)
    Abstract [en]

    Conformal prediction is a learning framework that produces models that associate with each of their predictions a measure of statistically valid confidence. These models are typically constructed on top of traditional machine learning algorithms. An important result of conformal prediction theory is that the models produced are provably valid under relatively weak assumptions—in particular, their validity is independent of the specific underlying learning algorithm on which they are based. Since validity is automatic, much research on conformal predictors has been focused on improving their informational and computational efficiency. As part of the efforts in constructing efficient conformal predictors, aggregated conformal predictors were developed, drawing inspiration from the field of classification and regression ensembles. Unlike early definitions of conformal prediction procedures, the validity of aggregated conformal predictors is not fully understood—while it has been shown that they might attain empirical exact validity under certain circumstances, their theoretical validity is conditional on additional assumptions that require further clarification. In this paper, we show why validity is not automatic for aggregated conformal predictors, and provide a revised definition of aggregated conformal predictors that gains approximate validity conditional on properties of the underlying learning algorithm.

  • 46.
    Löfström, Tuve
    et al.
    University of Borås, Sweden.
    Boström, Henrik
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Linusson, Henrik
    University of Borås, Sweden.
    Johansson, Ulf
    University of Borås, Sweden.
    Bias Reduction through Conditional Conformal Prediction2015In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 9, no 6, p. 1355-1375Article in journal (Refereed)
    Abstract [en]

    Conformal prediction (CP) is a relatively new framework in which predictive models output sets of predictions with a bound on the error rate, i.e., the probability of making an erroneous prediction is guaranteed to be equal to or less than a predefined significance level. Label-conditional conformal prediction (LCCP) is a specialization of the framework which gives a bound on the error rate for each individual class. For datasets with class imbalance, many learning algorithms have a tendency to predict the majority class more often than the expected relative frequency, i.e., they are biased in favor of the majority class. In this study, the class bias of standard and label-conditional conformal predictors is investigated. An empirical investigation on 32 publicly available datasets with varying degrees of class imbalance is presented. The experimental results show that CP is highly biased towards the majority class on imbalanced datasets, i.e., it can be expected to make a majority of its errors on the minority class. LCCP, on the other hand, is not biased towards the majority class. Instead, the errors are distributed between the classes almost in accordance with the prior class distribution.

  • 47.
    Löfström, Tuve
    et al.
    University of Borås, Sweden.
    Boström, Henrik
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Linusson, Henrik
    University of Borås, Sweden.
    Johansson, Ulf
    University of Borås, Sweden.
    Bias Reduction through Conditional Conformal Prediction2015In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 9, no 6, p. 1355-1375Article in journal (Refereed)
    Abstract [en]

    Conformal prediction (CP) is a relatively new framework in which predictive models output sets of predictions with a bound on the error rate, i.e., the probability of making an erroneous prediction is guaranteed to be equal to or less than a predefined significance level. Label-conditional conformal prediction (LCCP) is a specialization of the framework which gives a bound on the error rate for each individual class. For datasets with class imbalance, many learning algorithms have a tendency to predict the majority class more often than the expected relative frequency, i.e., they are biased in favor of the majority class. In this study, the class bias of standard and label-conditional conformal predictors is investigated. An empirical investigation on 32 publicly available datasets with varying degrees of class imbalance is presented. The experimental results show that CP is highly biased towards the majority class on imbalanced datasets, i.e., it can be expected to make a majority of its errors on the minority class. LCCP, on the other hand, is not biased towards the majority class. Instead, the errors are distributed between the classes almost in accordance with the prior class distribution.

  • 48.
    Löfström, Tuve
    et al.
    University of Borås, Faculty of Librarianship, Information, Education and IT.
    Boström, Henrik
    Stockholm University, Department of Computer and Systems Sciences.
    Linusson, Henrik
    University of Borås, Faculty of Librarianship, Information, Education and IT.
    Johansson, Ulf
    University of Borås, Faculty of Librarianship, Information, Education and IT.
    Bias Reduction through Conditional Conformal Prediction2015In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 19, no 6, p. 1355-1375Article in journal (Refereed)
  • 49.
    Löfström, Tuve
    et al.
    Högskolan i Borås, Akademin för bibliotek, information, pedagogik och IT.
    Boström, Henrik
    Stockholm University, Department of Computer and Systems Sciences.
    Linusson, Henrik
    Högskolan i Borås, Akademin för bibliotek, information, pedagogik och IT.
    Johansson, Ulf
    Högskolan i Borås, Akademin för bibliotek, information, pedagogik och IT.
    Bias reduction through conditional conformal prediction2015In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 19, no 6, p. 1355-1375Article in journal (Refereed)
    Abstract [en]

    Conformal prediction (CP) is a relatively new framework in which predictive models output sets of predictions with a bound on the error rate, i.e., the probability of making an erroneous prediction is guaranteed to be equal to or less than a predefined significance level. Label-conditional conformal prediction (LCCP) is a specialization of the framework which gives a bound on the error rate for each individual class. For datasets with class imbalance, many learning algorithms have a tendency to predict the majority class more often than the expected relative frequency, i.e., they are biased in favor of the majority class. In this study, the class bias of standard and label-conditional conformal predictors is investigated. An empirical investigation on 32 publicly available datasets with varying degrees of class imbalance is presented. The experimental results show that CP is highly biased towards the majority class on imbalanced datasets, i.e., it can be expected to make a majority of its errors on the minority class. LCCP, on the other hand, is not biased towards the majority class. Instead, the errors are distributed between the classes almost in accordance with the prior class distribution.

  • 50.
    Löfström, Tuve
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Johansson, Ulf
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Borström, Henrik
    Ensemble Member Selection Using Multi-Objective Optimization2009Conference paper (Refereed)
    Abstract [en]

    Both theory and a wealth of empirical studies have established that ensembles are more accurate than single predictive models. Unfortunately, the problem of how to maximize ensemble accuracy is, especially for classification, far from solved. In essence, the key problem is to find a suitable criterion, typically based on training or selection set performance, highly correlated with ensemble accuracy on novel data. Several studies have, however, shown that it is difficult to come up with a single measure, such as ensemble or base classifier selection set accuracy, or some measure based on diversity, that is a good general predictor for ensemble test accuracy. This paper presents a novel technique that for each learning task searches for the most effective combination of given atomic measures, by means of a genetic algorithm. Ensembles built from either neural networks or random forests were empirically evaluated on 30 UCI datasets. The experimental results show that when using the generated combined optimization criteria to rank candidate ensembles, a higher test set accuracy for the top ranked ensemble was achieved, compared to using ensemble accuracy on selection data alone. Furthermore, when creating ensembles from a pool of neural networks, the use of the generated combined criteria was shown to generally outperform the use of estimated ensemble accuracy as the single optimization criterion.

12 1 - 50 of 74
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf