Digitala Vetenskapliga Arkivet

Change search
Refine search result
1 - 30 of 30
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Boström, Henrik
    University of Skövde, School of Humanities and Informatics. University of Skövde, The Informatics Research Centre.
    Calibrating Random Forests2008In: Proceedings of the Seventh International Conference on Machine Learning and Applications, IEEE, 2008, p. 121-126Conference paper (Refereed)
    Abstract [en]

    When using the output of classifiers to calculate the expected utility of different alternatives in decision situations, the correctness of predicted class probabilities may be of crucial importance. However, even very accurate classifiers may output class probabilities of rather poor quality. One way of overcoming this problem is by means of calibration, i.e., mapping the original class probabilities to more accurate ones. Previous studies have however indicated that random forests are difficult to calibrate by standard calibration methods. In this work, a novel calibration method is introduced, which is based on a recent finding that probabilities predicted by forests of classification trees have a lower squared error compared to those predicted by forests of probability estimation trees (PETs). The novel calibration method is compared to the two standard methods, Platt scaling and isotonic regression, on 34 datasets from the UCI repository. The experiment shows that random forests of PETs calibrated by the novel method significantly outperform uncalibrated random forests of both PETs and classification trees, as well as random forests calibrated with the two standard methods, with respect to the squared error of predicted class probabilities.

  • 2.
    Boström, Henrik
    University of Skövde, School of Humanities and Informatics. University of Skövde, The Informatics Research Centre.
    Estimating class probabilities in random forests2007In: ICMLA 2007. Sixth International Conference onMachine Learning and Applications, 2007., IEEE Computer Society, 2007, p. 211-216Conference paper (Refereed)
    Abstract [en]

    For both single probability estimation trees (PETs) and ensembles of such trees, commonly employed class probability estimates correct the observed relative class frequencies in each leaf to avoid anomalies caused by small sample sizes. The effect of such corrections in random forests of PETs is investigated, and the use of the relative class frequency is compared to using two corrected estimates, the Laplace estimate and the m-estimate. An experiment with 34 datasets from the UCI repository shows that estimating class probabilities using relative class frequency clearly outperforms both using the Laplace estimate and the m-estimate with respect to accuracy, area under the ROC curve (AUC) and Brier score. Hence, in contrast to what is commonly employed for PETs and ensembles of PETs, these results strongly suggest that a non-corrected probability estimate should be used in random forests of PETs. The experiment further shows that learning random forests of PETs using relative class frequency significantly outperforms learning random forests of classification trees (i.e., trees for which only an unweighted vote on the most probable class is counted) with respect to both accuracy and AUC, but that the latter is clearly ahead of the former with respect to Brier score.

  • 3.
    Boström, Henrik
    University of Skövde, School of Humanities and Informatics. University of Skövde, The Informatics Research Centre.
    Feature vs. Classifier Fusion for Predictive Data Mining: a Case Study in Pesticide Classification2007In: 10th International Conference on Information Fusion, 2007, IEEE Press, 2007, p. 1-7Conference paper (Refereed)
    Abstract [en]

    Two strategies for fusing information from multiple sources when generating predictive models in the domain of pesticide classification are investigated: i) fusing different sets of features (molecular descriptors) before building a model and ii) fusing the classifiers built from the individual descriptor sets. An empirical investigation demonstrates that the choice of strategy can have a significant impact on the predictive performance. Furthermore, the experiment shows that the best strategy is dependent on the type of predictive model considered. When generating a decision tree for pesticide classification, a statistically significant difference in accuracy is observed in favor of combining predictions from the individual models compared to generating a single model from the fused set of molecular descriptors. On the other hand, when the model consists of an ensemble of decision trees, a statistically significant difference in accuracy is observed in favor of building the model from the fused set of descriptors compared to fusing ensemble models built from the individual sources.

  • 4.
    Boström, Henrik
    University of Skövde, School of Humanities and Informatics. University of Skövde, The Informatics Research Centre.
    Maximizing the Area under the ROC Curve with Decision Lists and Rule Sets2007In: Proceedings of the 7th SIAM International Conference on Data Mining / [ed] C. Apte, B. Liu, S. Parthasarathy, D. Skillicorn, Society for Industrial and Applied Mathematics , 2007, p. 27-34Conference paper (Refereed)
    Abstract [en]

    Decision lists (or ordered rule sets) have two attractive properties compared to unordered rule sets: they require a simpler classi¯cation procedure and they allow for a more compact representation. However, it is an open question what effect these properties have on the area under the ROC curve (AUC). Two ways of forming decision lists are considered in this study: by generating a sequence of rules, with a default rule for one of the classes, and by imposing an order upon rules that have been generated for all classes. An empirical investigation shows that the latter method gives a significantly higher AUC than the former, demonstrating that the compactness obtained by using one of the classes as a default is indeed associated with a cost. Furthermore, by using all applicable rules rather than the first in an ordered set, an even further significant improvement in AUC is obtained, demonstrating that the simple classification procedure is also associated with a cost. The observed gains in AUC for unordered rule sets compared to decision lists can be explained by that learning rules for all classes as well as combining multiple rules allow for examples to be ranked according to a more fine-grained scale compared to when applying rules in a fixed order and providing a default rule for one of the classes.

  • 5.
    Boström, Henrik
    et al.
    University of Skövde, School of Humanities and Informatics.
    Andler, Sten F.
    University of Skövde, School of Humanities and Informatics.
    Brohede, Marcus
    University of Skövde, School of Humanities and Informatics.
    Johansson, Ronnie
    University of Skövde, School of Humanities and Informatics.
    Karlsson, Alexander
    University of Skövde, School of Humanities and Informatics.
    van Laere, Joeri
    University of Skövde, School of Humanities and Informatics.
    Niklasson, Lars
    University of Skövde, School of Humanities and Informatics.
    Nilsson, Marie
    University of Skövde, School of Humanities and Informatics.
    Persson, Anne
    University of Skövde, School of Humanities and Informatics.
    Ziemke, Tom
    University of Skövde, School of Humanities and Informatics.
    On the Definition of Information Fusion as a Field of Research2007Report (Other academic)
    Abstract [en]

    A more precise definition of the field of information fusion can be of benefit to researchers within the field, who may use uch a definition when motivating their own work and evaluating the contribution of others. Moreover, it can enable researchers and practitioners outside the field to more easily relate their own work to the field and more easily understand the scope of the techniques and methods developed in the field. Previous definitions of information fusion are reviewed from that perspective, including definitions of data and sensor fusion, and their appropriateness as definitions for the entire research field are discussed. Based on strengths and weaknesses of existing definitions, a novel definition is proposed, which is argued to effectively fulfill the requirements that can be put on a definition of information fusion as a field of research.

    Download full text (pdf)
    FULLTEXT01
  • 6.
    Boström, Henrik
    et al.
    University of Skövde, School of Humanities and Informatics. University of Skövde, The Informatics Research Centre.
    Johansson, Ronnie
    University of Skövde, School of Humanities and Informatics. University of Skövde, The Informatics Research Centre.
    Karlsson, Alexander
    University of Skövde, School of Humanities and Informatics. University of Skövde, The Informatics Research Centre.
    On Evidential Combination Rules for Ensemble Classifiers2008In: Proceedings of the 11th International Conference on Information Fusion, IEEE , 2008, p. 553-560Conference paper (Refereed)
    Abstract [en]

    Ensemble classifiers are known to generally perform better than each individual classifier of which they consist. One approach to classifier fusion is to apply Shafer’s theory of evidence. While most approaches have adopted Dempster’s rule of combination, a multitude of combination rules have been proposed. A number of combination rules as well as two voting rules are compared when used in conjunction with a specific kind of ensemble classifier, known as random forests, w.r.t. accuracy, area under ROC curve and Brier score on 27 datasets. The empirical evaluation shows that the choice of combination rule can have a significant impact on the performance for a single dataset, but in general the evidential combination rules do not perform better than the voting rules for this particular ensemble design. Furthermore, among the evidential rules, the associative ones appear to have better performance than the non-associative.

  • 7.
    Boström, Henrik
    et al.
    University of Skövde, School of Humanities and Informatics. University of Skövde, The Informatics Research Centre.
    Norinder, Ulf
    Utilizing Information on Uncertainty for In Silico Modeling using Random Forests2009In: Proceedings of the 3rd Skövde Workshop on Information Fusion Topics (SWIFT 2009), University of Skövde , 2009, p. 59-62Conference paper (Refereed)
    Abstract [en]

    Information on uncertainty of measurements or estimates of molecular properties are rarely utilized by in silico predictive models. In this study, different approaches to handling uncertain numerical features are explored when using the stateof- the-art random forest algorithm for generating predictive models. Two main approaches are considered: i) sampling from probability distributions prior to tree generation, which does not require any change to the underlying tree learning algorithm, and ii) adjusting the algorithm to allow for handling probability distributions, similar to how missing values typically are handled, i.e., partitions may include fractions of examples. An experiment with six datasets concerning the prediction of various chemical properties is presented, where 95% confidence intervals are included for one of the 92 numerical features. In total, five approaches to handling uncertain numeric features are compared: ignoring the uncertainty, sampling from distributions that are assumed to be uniform and normal respectively, and adjusting tree learning to handle probability distributions that are assumed to be uniform and normal respectively. The experimental results show that all approaches that utilize information on uncertainty indeed outperform the single approach ignoring this, both with respect to accuracy and area under ROC curve. A decomposition of the squared error of the constituent classification trees shows that the highest variance is obtained by ignoring the information on uncertainty, but that this also results in the highest mean squared error of the constituent trees.

  • 8.
    Deegalla, Sampath
    et al.
    Dept. of Computer and Systems Sciences, Stockholm University and Royal Institute of Technology, Sweden.
    Boström, Henrik
    University of Skövde, School of Humanities and Informatics. University of Skövde, The Informatics Research Centre.
    Classification of Microarrays with kNN: Comparison of Dimensionality Reduction Methods2007In: Intelligent Data Engineering and Automated Learning – IDEAL 2007: 8th International Conference Birmingham, UK, December 16-19, 2007 Proceedings / [ed] Hujun Yin, Peter Tino, Emilio Corchado, Will Byrne, Xin Yao, Springer, 2007, p. 800-809Conference paper (Refereed)
    Abstract [en]

    Dimensionality reduction can often improve the performance of the k-nearest neighbor classifier (kNN) for high-dimensional data sets, such as microarrays. The effect of the choice of dimensionality reduction method on the predictive performance of kNN for classifying microarray data is an open issue, and four common dimensionality reduction methods, Principal Component Analysis (PCA), Random Projection (RP), Partial Least Squares (PLS) and Information Gain(IG), are compared on eight microarray data sets. It is observed that all dimensionality reduction methods result in more accurate classifiers than what is obtained from using the raw attributes. Furthermore, it is observed that both PCA and PLS reach their best accuracies with fewer components than the other two methods, and that RP needs far more components than the others to outperform kNN on the non-reduced dataset. None of the dimensionality reduction methods can be concluded to generally outperform the others, although PLS is shown to be superior on all four binary classification tasks, but the main conclusion from the study is that the choice of dimensionality reduction method can be of major importance when classifying microarrays using kNN.

  • 9.
    Deegalla, Sampath
    et al.
    Dept. of Computer and Systems Sciences, Stockholm University, Sweden.
    Boström, Henrik
    University of Skövde, School of Humanities and Informatics. University of Skövde, The Informatics Research Centre.
    Fusion of Dimensionality Reduction Methods: a Case Study in Microarray Classification2009In: Proceedings of the 12th International Conference on Information Fusion, ISIF , 2009, p. 460-465Conference paper (Refereed)
    Abstract [en]

    Dimensionality reduction has been demonstrated to improve the performance of the k-nearest neighbor (kNN) classifier for high-dimensional data sets, such as microarrays. However, the effectiveness of different dimensionality reduction methods varies, and it has been shown that no single method constantly outperforms the others. In contrast to using a single method, two approaches to fusing the result of applying dimensionality reduction methods are investigated: feature fusion and classifier fusion. It is shown that by fusing the output of multiple dimensionality reduction techniques, either by fusing the reduced features or by fusing the output of the resulting classifiers, both higher accuracy and higher robustness towards the choice of number of dimensions is obtained.

  • 10.
    Dudas, Catarina
    et al.
    University of Skövde, The Virtual Systems Research Centre. University of Skövde, School of Technology and Society.
    Boström, Henrik
    University of Skövde, The Informatics Research Centre. University of Skövde, School of Humanities and Informatics.
    Using Uncertain Chemical and Thermal Data to Predict Product Quality in a Casting Process2009In: Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data / [ed] Jian Pei; Lise Getoor; Ander De Keijzer, AMC, Inc. , 2009, p. 57-61Conference paper (Refereed)
    Abstract [en]

    Process and casting data from different sources have been collected and merged for the purpose of predicting, and determining what factors affect, the quality of cast products in a foundry. One problem is that the measurements cannot be directly aligned, since they are collected at different points in time, and instead they have to be approximated for specific time points, hence introducing uncertainty. An approach for addressing this problem is investigated, where uncertain numeric features values are represented by intervals and random forests are extended to handle such intervals. A preliminary experiment shows that the suggested way of forming the intervals, together with the extension of random forests, results in higher predictive performance compared to using single (expected) values for the uncertain features together with standard random forests.

  • 11.
    Dudas, Catarina
    et al.
    University of Skövde, The Virtual Systems Research Centre. University of Skövde, School of Technology and Society.
    Ng, Amos
    University of Skövde, The Virtual Systems Research Centre. University of Skövde, School of Technology and Society.
    Boström, Henrik
    University of Skövde, School of Humanities and Informatics. University of Skövde, The Informatics Research Centre.
    Information Extraction from Solution Set of Simulation-based Multi-objective Optimisation using Data Mining2009In: Proceedings of Industrial Simulation Conference 2009 / [ed] D. B. Das, V. Nassehi & L. Deka, EUROSIS-ETI , 2009, p. 65-69Conference paper (Refereed)
    Abstract [en]

    In this work, we investigate ways of extracting information from simulations, in particular from simulation-based multi-objective optimisation, in order to acquire information that can support human decision makers that aim for optimising manufacturing processes. Applying data mining for analyzing data generated using simulation is a fairly unexplored area. With the observation that the obtained solutions from a simulation-based multi-objective optimisation are all optimal (or close to the optimal Pareto front) so that they are bound to follow and exhibit certain relationships among variables vis-à-vis objectives, it is argued that using data mining to discover these relationships could be a promising procedure. The aim of this paper is to provide the empirical results from two simulation case studies to support such a hypothesis.

  • 12.
    Dudas, Catarina
    et al.
    University of Skövde, School of Technology and Society.
    Ng, Amos
    University of Skövde, School of Technology and Society.
    Boström, Henrik
    University of Skövde, School of Humanities and Informatics.
    Knowledge Extraction in Manufacturing using Data Mining Techniques2008In: Proceedings of the Swedish Production Symposium 2008, Stockholm, Sweden, November 18-20, 2008, 2008, p. 8 sidor-Conference paper (Refereed)
    Abstract [en]

    Nowadays many production companies collect and store production and process data in large databases. Unfortunately the data is rarely used in the most value generating way, i.e.,  finding  patterns  of  inconsistencies  and  relationships  between  process  settings  and quality  outcome.  This  paper  addresses  the  benefits  of  using  data  mining  techniques  in manufacturing  applications.  Two  different  applications  are  being  laid  out  but  the  used technique  and  software  is  the  same  in  both  cases.  The  first  case  deals  with  how  data mining  can  be  used  to  discover  the  affect  of  process  timing  and  settings  on  the  quality outcome in the casting industry. The result of a multi objective optimization of a camshaft process  is  being  used  as  the  second  case.  This  study  focuses  on  finding  the  most appropriate dispatching rule settings in the buffers on the line.  The  use  of  data  mining  techniques  in  these  two  cases  generated  previously  unknown knowledge. For example, in order to maximize throughput in the camshaft production, let the dispatching rule for the most severe bottleneck be of type Shortest Processing Time (SPT) and for the second bottleneck use any but Most Work Remaining (MWKR).

  • 13.
    Gammerman, Alexander
    et al.
    Royal Holloway Univ London, Egham, Surrey, England..
    Vovk, Vladimir
    Royal Holloway Univ London, Egham, Surrey, England..
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Carlsson, Lars
    Stena Line AB, Gothenburg, Sweden..
    Conformal and probabilistic prediction with applications: editorial2019In: Machine Learning, ISSN 0885-6125, E-ISSN 1573-0565, Vol. 108, no 3, p. 379-380Article in journal (Other academic)
  • 14.
    Hollmen, Jaakko
    et al.
    Aalto Univ, Dept Comp Sci, Espoo, Finland..
    Asker, Lars
    Stockholm Univ, Dept Comp & Syst Sci, Stockholm, Sweden..
    Karlsson, Isak
    Stockholm Univ, Dept Comp & Syst Sci, Stockholm, Sweden..
    Papapetrou, Panagiotis
    Stockholm Univ, Dept Comp & Syst Sci, Stockholm, Sweden..
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Wikner, Birgitta Norstedt
    Karolinska Inst, Dept Med, Ctr Pharmacoepidemiol CPE, Stockholm, Sweden..
    Ohman, Inger
    Karolinska Inst, Dept Med, Ctr Pharmacoepidemiol CPE, Stockholm, Sweden..
    Exploring epistaxis as an adverse effect of anti-thrombotic drugs and outdoor temperature2018In: 11TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS (PETRA 2018), ASSOC COMPUTING MACHINERY , 2018, p. 1-4Conference paper (Refereed)
    Abstract [en]

    Electronic health records contain a wealth of epidemiological information about diseases at the population level. Using a database of medical diagnoses and drug prescriptions in electronic health records, we investigate the correlation between outdoor temperature and the incidence of epistaxis over time for two groups of patients. One group consists of patients that had been diagnosed with epistaxis and also been prescribed at least one of the three anti-thrombotic agents: Warfarin, Apixaban, or Rivaroxaban. The other group consists of patients that had been diagnosed with epistaxis and not been prescribed any of the three anti-thrombotic drugs. We find a strong negative correlation between the incidence of epistaxis and outdoor temperature for the group that had not been prescribed any of the three anti-thrombotic drugs, while there is a weaker correlation between incidence of epistaxis and outdoor temperature for the other group. It is, however, clear that both groups are affected in a similar way, such that the incidence of epistaxis increases with colder temperatures.

  • 15.
    Johansson, Ronnie
    et al.
    University of Skövde, School of Humanities and Informatics. University of Skövde, The Informatics Research Centre.
    Boström, Henrik
    University of Skövde, School of Humanities and Informatics. University of Skövde, The Informatics Research Centre.
    Karlsson, Alexander
    University of Skövde, School of Humanities and Informatics. University of Skövde, The Informatics Research Centre.
    A Study on Class-Specifically Discounted Belief for Ensemble Classifiers2008In: Proceedings of the IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI 2008), IEEE Press, 2008, p. 614-619Conference paper (Refereed)
    Abstract [en]

    Ensemble classifiers are known to generally perform better than their constituent classifiers. Whereas a lot of work has been focusing on the generation of classifiers for ensembles, much less attention has been given to the fusion of individual classifier outputs. One approach to fuse the outputs is to apply Shafer’s theory of evidence, which provides a flexible framework for expressing and fusing beliefs. However, representing and fusing beliefs is non-trivial since it can be performed in a multitude of ways within the evidential framework. In a previous article, we compared different evidential combination rules for ensemble fusion. The study involved a single belief representation which involved discounting (i.e., weighting) the classifier outputs with classifier reliability. The classifier reliability was interpreted as the classifier’s estimated accuracy, i.e., the percentage of correctly classified examples. However, classifiers may have different performance for different classes and in this work we assign the reliability of a classifier output depending on the classspecific reliability of the classifier. Using 27 UCI datasets, we compare the two different ways of expressing beliefs and some evidential combination rules. The result of the study indicates that there is indeed an advantage of utilizing class-specific reliability compared to accuracy in an evidential framework for combining classifiers in the ensemble design considered.

  • 16.
    Johansson, Ulf
    et al.
    School of Business and Informatics, University of Borås, Sweden.
    Boström, Henrik
    University of Skövde, School of Humanities and Informatics. University of Skövde, The Informatics Research Centre.
    König, Rikard
    School of Business and Informatics, University of Borås, Sweden.
    Extending Nearest Neighbor Classification with Spheres of Confidence2008In: Proceedings of the Twenty-First International FLAIRS Conference (FLAIRS 2008), AAAI Press, 2008, p. 282-287Conference paper (Refereed)
    Abstract [en]

    The standard kNN algorithm suffers from two major drawbacks: sensitivity to the parameter value k, i.e., the number of neighbors, and the use of k as a global constant that is independent of the particular region in which theexample to be classified falls. Methods using weighted voting schemes only partly alleviate these problems, since they still involve choosing a fixed k. In this paper, a novel instance-based learner is introduced that does not require kas a parameter, but instead employs a flexible strategy for determining the number of neighbors to consider for the specific example to be classified, hence using a local instead of global k. A number of variants of the algorithm are evaluated on 18 datasets from the UCI repository. The novel algorithm in its basic form is shown to significantly outperform standard kNN with respect to accuracy, and an adapted version of the algorithm is shown to be clearlyahead with respect to the area under ROC curve. Similar to standard kNN, the novel algorithm still allows for various extensions, such as weighted voting and axes scaling.

  • 17.
    Johansson, Ulf
    et al.
    Jonkoping Univ, Dept Comp Sci & Informat, Jonkoping, Sweden..
    Lofstrom, Tuve
    Jonkoping Univ, Dept Comp Sci & Informat, Jonkoping, Sweden..
    Linusson, Henrik
    Univ Boras, Dept Informat Technol, Boras, Sweden..
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Efficient Venn predictors using random forests2019In: Machine Learning, ISSN 0885-6125, E-ISSN 1573-0565, Vol. 108, no 3, p. 535-550Article in journal (Refereed)
    Abstract [en]

    Successful use of probabilistic classification requires well-calibrated probability estimates, i.e., the predicted class probabilities must correspond to the true probabilities. In addition, a probabilistic classifier must, of course, also be as accurate as possible. In this paper, Venn predictors, and its special case Venn-Abers predictors, are evaluated for probabilistic classification, using random forests as the underlying models. Venn predictors output multiple probabilities for each label, i.e., the predicted label is associated with a probability interval. Since all Venn predictors are valid in the long run, the size of the probability intervals is very important, with tighter intervals being more informative. The standard solution when calibrating a classifier is to employ an additional step, transforming the outputs from a classifier into probability estimates, using a labeled data set not employed for training of the models. For random forests, and other bagged ensembles, it is, however, possible to use the out-of-bag instances for calibration, making all training data available for both model learning and calibration. This procedure has previously been successfully applied to conformal prediction, but was here evaluated for the first time for Venn predictors. The empirical investigation, using 22 publicly available data sets, showed that all four versions of the Venn predictors were better calibrated than both the raw estimates from the random forest, and the standard techniques Platt scaling and isotonic regression. Regarding both informativeness and accuracy, the standard Venn predictor calibrated on out-of-bag instances was the best setup evaluated. Most importantly, calibrating on out-of-bag instances, instead of using a separate calibration set, resulted in tighter intervals and more accurate models on every data set, for both the Venn predictors and the Venn-Abers predictors.

  • 18.
    Johansson, Ulf
    et al.
    University of Borås, Sch Business & Informat.
    Löfström, Tuve
    University of Borås, Sch Business & Informat.
    Boström, Henrik
    University of Skövde, School of Humanities and Informatics. University of Skövde, The Informatics Research Centre.
    The Problem with Ranking Ensembles Based on Training or Validation Performance2008In: Proceedings of the International Joint Conference on Neural Networks, IEEE Press, 2008, p. 3221-3227Conference paper (Refereed)
    Abstract [en]

     The main purpose of this study was to determine whether it is possible to somehow use results on training or validation data to estimate ensemble performance on novel data. With the specific setup evaluated; i.e. using ensembles built from a pool of independently trained neural networks and targeting diversity only implicitly, the answer is a resounding no. Experimentation, using 13 UCI datasets, shows that there is in general nothing to gain in performance on novel data by choosing an ensemble based on any of the training measures evaluated here. This is despite the fact that the measures evaluated include all the most frequently used; i.e. ensemble training and validation accuracy, base classifier training and validation accuracy, ensemble training and validation AUC and two diversity measures. The main reason is that all ensembles tend to have quite similar performance, unless we deliberately lower the accuracy of the base classifiers. The key consequence is, of course, that a data miner can do no better than picking an ensemble at random. In addition, the results indicate that it is futile to look for an algorithm aimed at optimizing ensemble performance by somehow selecting a subset of available base classifiers.

     

  • 19.
    Johansson, Ulf
    et al.
    University of Borås, School of Business and Informatics, Borås, Sweden.
    Sönströd, Cecilia
    University of Borås, School of Business and Informatics, Borås, Sweden.
    Löfström, Tuve
    University of Skövde, School of Humanities and Informatics.
    Boström, Henrik
    University of Skövde, School of Humanities and Informatics. University of Skövde, The Informatics Research Centre.
    Chipper - A Novel Algorithm for Concept Description2008In: Proceedings of the 2008 conference on Tenth Scandinavian Conference on Artificial Intelligence: SCAI 2008 / [ed] Anders Holst, Per Kreuger, Peter Funk, IOS Press, 2008, p. 133-140Conference paper (Refereed)
    Abstract [en]

    In this paper, several demands placed on concept description algorithms are identified and discussed. The most important criterion is the ability to produce compact rule sets that, in a natural and accurate way, describe the most important relationships in the underlying domain. An algorithm based on the identified criteria is presented and evaluated. The algorithm, named Chipper, produces decision lists, where each rule covers a maximum number of remaining instances while meeting requested accuracy requirements. In the experiments, Chipper is evaluated on nine UCI data sets. The main result is that Chipper produces compact and understandable rule sets, clearly fulfilling the overall goal of concept description. In the experiments, Chipper’s accuracy is similar to standard decision tree and rule induction algorithms, while rule sets have superior comprehensibility.

  • 20.
    Karunaratne, Thashmee
    et al.
    KTH, School of Information and Communication Technology (ICT), Computer and Systems Sciences, DSV.
    Boström, Henrik
    University of Skövde, Sweden.
    Using background knowledge for graph based learning: a case study in chemoinformatics2007In: IMECS 2007: International Multiconference of Engineers and Computer Scientists, Vols I and II, HONG KONG: INT ASSOC ENGINEERS-IAENG , 2007, p. 153-157Conference paper (Refereed)
    Abstract [en]

    Incorporating background knowledge in the learning process is proven beneficial for numerous applications of logic based learning methods. Yet the effect of background knowledge in graph based learning is not systematically explored. This paper describes and demonstrates the first step in this direction and elaborates on how additional relevant background knowledge could be used to improve the predictive performance of a graph learner. A case study in chemoinformatics is undertaken in this regard in which various types of background knowledge are encoded in graphs that are given as input to a graph learner. It is shown that the type of background knowledge encoded indeed has an effect on the predictive performance, and it is concluded that encoding appropriate background knowledge can be more important than the choice of the graph learning algorithm.

  • 21. Linusson, H.
    et al.
    Johansson, U.
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Efficient conformal predictor ensembles2019In: Neurocomputing, ISSN 0925-2312, E-ISSN 1872-8286Article in journal (Refereed)
    Abstract [en]

    In this paper, we study a generalization of a recently developed strategy for generating conformal predictor ensembles: out-of-bag calibration. The ensemble strategy is evaluated, both theoretically and empirically, against a commonly used alternative ensemble strategy, bootstrap conformal prediction, as well as common non-ensemble strategies. A thorough analysis is provided of out-of-bag calibration, with respect to theoretical validity, empirical validity (error rate), efficiency (prediction region size) and p-value stability (the degree of variance observed over multiple predictions for the same object). Empirical results show that out-of-bag calibration displays favorable characteristics with regard to these criteria, and we propose that out-of-bag calibration be adopted as a standard method for constructing conformal predictor ensembles.

  • 22.
    Linusson, Henrik
    et al.
    Univ Boras, Dept Informat Technol, Boras, Sweden..
    Johansson, Ulf
    Jonkoping Univ, Dept Comp Sci & Informat, Jonkoping, Sweden..
    Boström, Henrik
    KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH).
    Löfström, Tuve
    Jonkoping Univ, Dept Comp Sci & Informat, Jonkoping, Sweden..
    Classification with Reject Option Using Conformal Prediction2018In: Advances in Knowledge Discovery and Data Mining, PAKDD 2018, PT I / [ed] Phung, D Tseng, VS Webb, GI Ho, B Ganji, M Rashidi, L, Springer, 2018, Vol. 10937, p. 94-105Conference paper (Refereed)
    Abstract [en]

    In this paper, we propose a practically useful means of interpreting the predictions produced by a conformal classifier. The proposed interpretation leads to a classifier with a reject option, that allows the user to limit the number of erroneous predictions made on the test set, without any need to reveal the true labels of the test objects. The method described in this paper works by estimating the cumulative error count on a set of predictions provided by a conformal classifier, ordered by their confidence. Given a test set and a user-specified parameter k, the proposed classification procedure outputs the largest possible amount of predictions containing on average at most k errors, while refusing to make predictions for test objects where it is too uncertain. We conduct an empirical evaluation using benchmark datasets, and show that we are able to provide accurate estimates for the error rate on the test set.

  • 23.
    Linusson, Henrik
    et al.
    Department of Information Technology, University of Borås, Sweden.
    Norinder, Ulf
    Swetox, Karolinska Institutet, Unit of Toxicology Sciences, Sweden.
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS. Department of Computer and Systems Sciences, Stockholm University, Sweden.
    Johansson, Ulf
    Högskolan i Jönköping, JTH, Datateknik och informatik.
    Löfström, Tuve
    Högskolan i Jönköping, JTH. Forskningsmiljö Datavetenskap och informatik.
    On the calibration of aggregated conformal predictors2017In: Proceedings of Machine Learning Research: Volume 60: Conformal and Probabilistic Prediction and Applications, 13-16 June 2017, Stockholm, Sweden / [ed] Alex Gammerman, Vladimir Vovk, Zhiyuan Luo, and Harris Papadopoulos, 2017, p. 154-173Conference paper (Refereed)
    Abstract [en]

    Conformal prediction is a learning framework that produces models that associate with each of their predictions a measure of statistically valid confidence. These models are typically constructed on top of traditional machine learning algorithms. An important result of conformal prediction theory is that the models produced are provably valid under relatively weak assumptions—in particular, their validity is independent of the specific underlying learning algorithm on which they are based. Since validity is automatic, much research on conformal predictors has been focused on improving their informational and computational efficiency. As part of the efforts in constructing efficient conformal predictors, aggregated conformal predictors were developed, drawing inspiration from the field of classification and regression ensembles. Unlike early definitions of conformal prediction procedures, the validity of aggregated conformal predictors is not fully understood—while it has been shown that they might attain empirical exact validity under certain circumstances, their theoretical validity is conditional on additional assumptions that require further clarification. In this paper, we show why validity is not automatic for aggregated conformal predictors, and provide a revised definition of aggregated conformal predictors that gains approximate validity conditional on properties of the underlying learning algorithm.

    Download full text (pdf)
    FULLTEXT01
  • 24.
    Löfström, Tuve
    et al.
    School of Business and Informatics, University o f Borås, Borås, Sweden.
    Johansson, Ulf
    School of Business and Informatics, University o f Borås, Borås, Sweden.
    Boström, Henrik
    University of Skövde, School of Humanities and Informatics. University of Skövde, The Informatics Research Centre.
    Ensemble Member Selection Using Multi-Objective Optimization2009In: Proceedings of IEEE Symposium on Computational Intelligence and Data Mining (CIDM), IEEE conference proceedings, 2009, p. 245-251Conference paper (Refereed)
    Abstract [en]

    Both theory and a wealth of empirical studies have established that ensembles are more accurate than single predictive models. Unfortunately, the problem of how to maximize ensemble accuracy is, especially for classification, far from solved. In essence, the key problem is to find a suitable criterion, typically based on training or selection set performance, highly correlated with ensemble accuracy on novel data. Several studies have, however, shown that it is difficult to come up with a single measure, such as ensemble or base classifier selection set accuracy, or some measure based on diversity, that is a good general predictor for ensemble test accuracy. This paper presents a novel technique that for each learning task searches for the most effective combination of given atomic measures, by means of a genetic algorithm. Ensembles built from either neural networks or random forests were empirically evaluated on 30 UCI datasets. The experimental results show that when using the generated combined optimization criteria to rank candidate ensembles, a higher test set accuracy for the top ranked ensemble was achieved, compared to using ensemble accuracy on selection data alone. Furthermore, when creating ensembles from a pool of neural networks, the use of the generated combined criteria was shown to generally outperform the use of estimated ensemble accuracy as the single optimization criterion.

  • 25.
    Löfström, Tuve
    et al.
    School of Business and Informatics, University of Borås.
    Johansson, Ulf
    School of Business and Informatics, University of Borås.
    Boström, Henrik
    University of Skövde, School of Humanities and Informatics. University of Skövde, The Informatics Research Centre.
    On the Use of Accuracy and Diversity Measures for Evaluating and Selecting Ensembles of Classifiers2008In: Proceedings of the Seventh International Conference on Machine Learning and Applications, IEEE, 2008, p. 127-132Conference paper (Refereed)
    Abstract [en]

    The test set accuracy for ensembles of classifiers selected based on single measures of accuracy and diversity as well as combinations of such measures is investigated. It is found that by combining measures, a higher test set accuracy may be obtained than by using any single accuracy or diversity measure. It is further investigated whether a multi-criteria search for an ensemble that maximizes both accuracy and diversity leads to more accurate ensembles than by optimizing a single criterion. The results indicate that it might be more beneficial to search for ensembles that are both accurate and diverse. Furthermore, the results show that diversity measures could compete with accuracy measures as selection criterion.

  • 26. Rao, W.
    et al.
    Boström, Henrik
    KTH, Superseded Departments (pre-2005), Numerical Analysis and Computer Science, NADA.
    Xie, S.
    Rule induction for structural damage identification2004In: Proc. Int. Conf. Mach. Learning Cybernetics, 2004, p. 2865-2869Conference paper (Refereed)
    Abstract [en]

    Structural damage identification is becoming a worldwide research subject. Some machine learning methods have been used to solve this problem, and most of them are neural network methods. In this paper, three different rule inductive methods named as Divide-and-Conquer (DAC), Bagging and Separate-and-Conquer (SAC) are investigated for predicting the damage position and extent of a concrete beam. Then radial basis function neural network (RBFNN) is used here for comparative purposes. The rule inductive methods/ especially Bagging are shown to obtain good prediction.

  • 27.
    Safinianaini, Negar
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Kaldo, Viktor
    Department of Psychology, Faculty of Health and Life Sciences, Linnaeus University, Växjö, Sweden;Centre for Psychiatry Research, Department of Clinical Neuroscience, Karolinska Institutet, and Stockholm Health Care Services, Stockholm County Council, Stockholm, Sweden.
    Gated hidden markov models for early prediction of outcome of internet-based cognitive behavioral therapy2019In: 17th Conference on Artificial Intelligence in Medicine, AIME 2019, Cham: Springer Verlag , 2019, p. 160-169Conference paper (Refereed)
    Abstract [en]

    Depression is a major threat to public health and its mitigation is considered to be of utmost importance. Internet-based Cognitive Behavioral Therapy (ICBT) is one of the employed treatments for depression. However, for the approach to be effective, it is crucial that the outcome of the treatment is accurately predicted as early as possible, to allow for its adaptation to the individual patient. Hidden Markov models (HMMs) have been commonly applied to characterize systematic changes in multivariate time series within health care. However, they have limited capabilities in capturing long-range interactions between emitted symbols. For the task of analyzing ICBT data, one such long-range interaction concerns the dependence of state transition on fractional change of emitted symbols. Gated Hidden Markov Models (GHMMs) are proposed as a solution to this problem. They extend standard HMMs by modifying the Expectation Maximization algorithm; for each observation sequence, the new algorithm regulates the transition probability update based on the fractional change, as specified by domain knowledge. GHMMs are compared to standard HMMs and a recently proposed approach, Inertial Hidden Markov Models, on the task of early prediction of ICBT outcome for treating depression; the algorithms are evaluated on outcome prediction, up to 7 weeks before ICBT ends. GHMMs are shown to outperform both alternative models, with an improvement of AUC ranging from 12 to 23%. These promising results indicate that considering fractional change of the observation sequence when updating state transition probabilities may indeed have a positive effect on early prediction of ICBT outcome.

  • 28.
    Sönströd, Cecilia
    et al.
    School of Business and Informatics, University of Borås, Sweden.
    Johansson, Ulf
    School of Business and Informatics, University of Borås, Sweden.
    Norinder, Ulf
    AstraZeneca R&D, Södertälje, Sweden.
    Boström, Henrik
    University of Skövde, School of Humanities and Informatics. University of Skövde, The Informatics Research Centre.
    Comprehensible Models for Predicting Molecular Interaction with Heart-Regulating Genes2008In: Proceedings of the Seventh International Conference on Machine Learning and Applications, IEEE, 2008, p. 559-564Conference paper (Refereed)
    Abstract [en]

    When using machine learning for in silico modeling, the goal is normally to obtain highly accurate predictive models. Often, however, models should also bring insights into interesting relationships in the domain. It is then desirable that machine learning techniques have the ability to obtain small and transparent models, where the user can control the tradeoff between accuracy, comprehensibility and coverage. In this study, three different decision list algorithms are evaluated on a data set concerning the interaction of molecules with a human gene that regulates heart functioning (hERG). The results show that decision list algorithms can obtain predictive performance not far from the state-of-the-art method random forests, but also that algorithms focusing on accuracy alone may produce complex decision lists that are very hard to interpret. The experiments also show that by sacrificing accuracy only to a limited degree, comprehensibility (measured as both model size and classification complexity) can be improved remarkably.

  • 29.
    Vasiloudis, Theodore
    et al.
    RISE AI.
    Cho, Hyunsu
    AmazonWebServices.
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Block-distributed Gradient Boosted Trees2019In: SIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery (ACM), 2019, p. 1025-1028Conference paper (Refereed)
    Abstract [en]

    The Gradient Boosted Tree (GBT) algorithm is one of the most popular machine learning algorithms used in production, for tasks that include Click-Through Rate (CTR) prediction and learning-to-rank. To deal with the massive datasets available today, many distributed GBT methods have been proposed. However, they all assume a row-distributed dataset, addressing scalability only with respect to the number of data points and not the number of features, and increasing communication cost for high-dimensional data. In order to allow for scalability across both the data point and feature dimensions, and reduce communication cost, we propose block-distributed GBTs. We achieve communication efficiency by making full use of the data sparsity and adapting the Quickscorer algorithm to the block-distributed setting. We evaluate our approach using datasets with millions of features, and demonstrate that we are able to achieve multiple orders of magnitude reduction in communication cost for sparse data, with no loss in accuracy, while providing a more scalable design. As a result, we are able to reduce the training time for high-dimensional data, and allow more cost-effective scale-out without the need for expensive network communication.

  • 30.
    Vasiloudis, Theodore
    et al.
    RISE SICS, Stockholm, Sweden..
    Morales, Gianmarco De Francisci
    ISI Fdn, Turin, Italy..
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Quantifying Uncertainty in Online Regression Forests2019In: Journal of machine learning research, ISSN 1532-4435, E-ISSN 1533-7928, Vol. 20, p. 1-35, article id 155Article in journal (Refereed)
    Abstract [en]

    Accurately quantifying uncertainty in predictions is essential for the deployment of machine learning algorithms in critical applications where mistakes are costly. Most approaches to quantifying prediction uncertainty have focused on settings where the data is static, or bounded. In this paper, we investigate methods that quantify the prediction uncertainty in a streaming setting, where the data is potentially unbounded. We propose two meta-algorithms that produce prediction intervals for online regression forests of arbitrary tree models; one based on conformal prediction, and the other based on quantile regression. We show that the approaches are able to maintain specified error rates, with constant computational cost per example and bounded memory usage. We provide empirical evidence that the methods outperform the state-of-the-art in terms of maintaining error guarantees, while being an order of magnitude faster. We also investigate how the algorithms are able to recover from concept drift.

1 - 30 of 30
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf