Change search
Refine search result
1 - 18 of 18
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Dudas, Catarina
    et al.
    University of Skövde, School of Engineering Science. University of Skövde, The Virtual Systems Research Centre.
    Ng, Amos H. C.
    University of Skövde, School of Engineering Science. University of Skövde, The Virtual Systems Research Centre. University of Skövde.
    Boström, Henrik
    Department of Computer and Systems Sciences, Stockholm University, Kista, Sweden.
    Post-analysis of multi-objective optimization solutions using decision trees2015In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 19, no 2, p. 259-278Article in journal (Refereed)
    Abstract [en]

    Evolutionary algorithms are often applied to solve multi-objective optimization problems. Such algorithms effectively generate solutions of wide spread, and have good convergence properties. However, they do not provide any characteristics of the found optimal solutions, something which may be very valuable to decision makers. By performing a post-analysis of the solution set from multi-objective optimization, relationships between the input space and the objective space can be identified. In this study, decision trees are used for this purpose. It is demonstrated that they may effectively capture important characteristics of the solution sets produced by multi-objective optimization methods. It is furthermore shown that the discovered relationships may be used for improving the search for additional solutions. Two multi-objective problems are considered in this paper; a well-studied benchmark function problem with on a beforehand known optimal Pareto front, which is used for verification purposes, and a multi-objective optimization problem of a real-world production system. The results show that useful relationships may be identified by employing decision tree analysis of the solution sets from multi-objective optimizations.

  • 2. Dudas, Catarina
    et al.
    Ng, Amosh. C.
    Boström, Henrik
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Post-analysis of multi-objective optimization solutions using decision trees2015In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 19, no 2, p. 259-278Article in journal (Refereed)
    Abstract [en]

    Evolutionary algorithms are often applied to solve multi-objective optimization problems. Such algorithms effectively generate solutions of wide spread, and have good convergence properties. However, they do not provide any characteristics of the found optimal solutions, something which may be very valuable to decision makers. By performing a post-analysis of the solution set from multi-objective optimization, relationships between the input space and the objective space can be identified. In this study, decision trees are used for this purpose. It is demonstrated that they may effectively capture important characteristics of the solution sets produced by multi-objective optimization methods. It is furthermore shown that the discovered relationships may be used for improving the search for additional solutions. Two multi-objective problems are considered in this paper; a well-studied benchmark function problem with on a beforehand known optimal Pareto front, which is used for verification purposes, and a multi-objective optimization problem of a real-world production system. The results show that useful relationships may be identified by employing decision tree analysis of the solution sets from multi-objective optimizations.

  • 3. Dudas, Catarina
    et al.
    Ng, Amosh. C.
    Boström, Henrik
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Post-analysis of multi-objective optimization solutions using decision trees2015In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 19, no 2, p. 259-278Article in journal (Refereed)
    Abstract [en]

    Evolutionary algorithms are often applied to solve multi-objective optimization problems. Such algorithms effectively generate solutions of wide spread, and have good convergence properties. However, they do not provide any characteristics of the found optimal solutions, something which may be very valuable to decision makers. By performing a post-analysis of the solution set from multi-objective optimization, relationships between the input space and the objective space can be identified. In this study, decision trees are used for this purpose. It is demonstrated that they may effectively capture important characteristics of the solution sets produced by multi-objective optimization methods. It is furthermore shown that the discovered relationships may be used for improving the search for additional solutions. Two multi-objective problems are considered in this paper; a well-studied benchmark function problem with on a beforehand known optimal Pareto front, which is used for verification purposes, and a multi-objective optimization problem of a real-world production system. The results show that useful relationships may be identified by employing decision tree analysis of the solution sets from multi-objective optimizations.

  • 4.
    Haghighi, Pari Delir
    et al.
    Centre for Distributed Systems and Software Engineering, Monash University.
    Zaslavsky, Arkady
    Krishnaswamy, Shonali
    Centre for Distributed Systems and Software Engineering, Monash University.
    Gaber, Mohamed
    Centre for Distributed Systems and Software Engineering, Monash University.
    Loke, Seng Wai
    Department of Computer Science and Computer Engineering, La Trobe University.
    Context-aware adaptive data stream mining2009In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 13, no 3, p. 423-434Article in journal (Refereed)
    Abstract [en]

    In resource-constrained devices, adaptation of data stream processing to variations of data rates and availability of resources is crucial for consistency and continuity of running applications. However, to enhance and maximize the benefits of adaptation, there is a need to go beyond mere computational and device capabilities to encompass the full spectrum of context-awareness. This paper presents a general approach for context-aware adaptive mining of data streams that aims to dynamically and autonomously adjust data stream mining parameters according to changes in context and situations. We perform intelligent and real-time analysis of data streams generated from sensors that is under-pinned using context-aware adaptation. A prototype of the proposed architecture is implemented and evaluated in the paper through a real-world scenario in the area of healthcare monitoring.

  • 5. Johansson, Ulf
    et al.
    Sönströd, Cecilia
    Löfström, Tuve
    Boström, Henrik
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Obtaining accurate and comprehensible classifiers using oracle coaching2012In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 16, no 2, p. 247-263Article in journal (Refereed)
    Abstract [en]

    While ensemble classifiers often reach high levels of predictive performance, the resulting models are opaque and hence do not allow direct interpretation. When employing methods that do generate transparent models, predictive performance typically has to be sacrificed. This paper presents a method of improving predictive performance of transparent models in the very common situation where instances to be classified, i.e., the production data, are known at the time of model building. This approach, named oracle coaching, employs a strong classifier, called an oracle, to guide the generation of a weaker, but transparent model. This is accomplished by using the oracle to predict class labels for the production data, and then applying the weaker method on this data, possibly in conjunction with the original training set. Evaluation on 30 data sets from the UCI repository shows that oracle coaching significantly improves predictive performance, measured by both accuracy and area under ROC curve, compared to using training data only. This result is shown to be robust for a variety of methods for generating the oracles and transparent models. More specifically, random forests and bagged radial basis function networks are used as oracles, while J48 and JRip are used for generating transparent models. The evaluation further shows that significantly better results are obtained when using the oracle-classified production data together with the original training data, instead of using only oracle data. An analysis of the fidelity of the transparent models to the oracles shows that performance gains can be expected from increasing oracle performance rather than from increasing fidelity. Finally, it is shown that further performance gains can be achieved by adjusting the relative weights of training data and oracle data.

  • 6.
    Johansson, Ulf
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Sönströd, Cecilia
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Löfström, Tuve
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Boström, Henrik
    Obtaining accurate and comprehensible classifiers using oracle coaching2012In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. Volume 16, no Number 2, p. 247-263Article in journal (Refereed)
    Abstract [en]

    While ensemble classifiers often reach high levels of predictive performance, the resulting models are opaque and hence do not allow direct interpretation. When employing methods that do generate transparent models, predictive performance typically has to be sacrificed. This paper presents a method of improving predictive performance of transparent models in the very common situation where instances to be classified, i.e., the production data, are known at the time of model building. This approach, named oracle coaching, employs a strong classifier, called an oracle, to guide the generation of a weaker, but transparent model. This is accomplished by using the oracle to predict class labels for the production data, and then applying the weaker method on this data, possibly in conjunction with the original training set. Evaluation on 30 data sets from the UCI repository shows that oracle coaching significantly improves predictive performance, measured by both accuracy and area under ROC curve, compared to using training data only. This result is shown to be robust for a variety of methods for generating the oracles and transparent models. More specifically, random forests and bagged radial basis function networks are used as oracles, while J48 and JRip are used for generating transparent models. The evaluation further shows that significantly better results are obtained when using the oracle-classified production data together with the original training data, instead of using only oracle data. An analysis of the fidelity of the transparent models to the oracles shows that performance gains can be expected from increasing oracle performance rather than from increasing fidelity. Finally, it is shown that further performance gains can be achieved by adjusting the relative weights of training data and oracle data.

  • 7.
    Johansson, Ulf
    et al.
    University of Borås, School of Business and IT.
    Sönströd, Cecilia
    University of Borås, School of Business and IT.
    Löfström, Tuwe
    University of Borås, School of Business and IT.
    Boström, Henrik
    Obtaining accurate and comprehensible classifiers using oracle coaching2012In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. Volume 16, no Number 2, p. 247-263Article in journal (Refereed)
    Abstract [en]

    While ensemble classifiers often reach high levels of predictive performance, the resulting models are opaque and hence do not allow direct interpretation. When employing methods that do generate transparent models, predictive performance typically has to be sacrificed. This paper presents a method of improving predictive performance of transparent models in the very common situation where instances to be classified, i.e., the production data, are known at the time of model building. This approach, named oracle coaching, employs a strong classifier, called an oracle, to guide the generation of a weaker, but transparent model. This is accomplished by using the oracle to predict class labels for the production data, and then applying the weaker method on this data, possibly in conjunction with the original training set. Evaluation on 30 data sets from the UCI repository shows that oracle coaching significantly improves predictive performance, measured by both accuracy and area under ROC curve, compared to using training data only. This result is shown to be robust for a variety of methods for generating the oracles and transparent models. More specifically, random forests and bagged radial basis function networks are used as oracles, while J48 and JRip are used for generating transparent models. The evaluation further shows that significantly better results are obtained when using the oracle-classified production data together with the original training data, instead of using only oracle data. An analysis of the fidelity of the transparent models to the oracles shows that performance gains can be expected from increasing oracle performance rather than from increasing fidelity. Finally, it is shown that further performance gains can be achieved by adjusting the relative weights of training data and oracle data.

  • 8.
    Johansson, Ulf
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Sönströd, Cecilia
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Löfström, Tuwe
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Boström, Henrik
    Stockholm University, Sweden.
    Obtaining accurate and comprehensible classifiers using oracle coaching2012In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 16, no 2, p. 247-263Article in journal (Refereed)
    Abstract [en]

    While ensemble classifiers often reach high levels of predictive performance, the resulting models are opaque and hence do not allow direct interpretation. When employing methods that do generate transparent models, predictive performance typically has to be sacrificed. This paper presents a method of improving predictive performance of transparent models in the very common situation where instances to be classified, i.e., the production data, are known at the time of model building. This approach, named oracle coaching, employs a strong classifier, called an oracle, to guide the generation of a weaker, but transparent model. This is accomplished by using the oracle to predict class labels for the production data, and then applying the weaker method on this data, possibly in conjunction with the original training set. Evaluation on 30 data sets from the UCI repository shows that oracle coaching significantly improves predictive performance, measured by both accuracy and area under ROC curve, compared to using training data only. This result is shown to be robust for a variety of methods for generating the oracles and transparent models. More specifically, random forests and bagged radial basis function networks are used as oracles, while J48 and JRip are used for generating transparent models. The evaluation further shows that significantly better results are obtained when using the oracle-classified production data together with the original training data, instead of using only oracle data. An analysis of the fidelity of the transparent models to the oracles shows that performance gains can be expected from increasing oracle performance rather than from increasing fidelity. Finally, it is shown that further performance gains can be achieved by adjusting the relative weights of training data and oracle data.

  • 9. Karunaratne, Thashmee
    et al.
    Bostrom, Henrik
    Norinder, Ulf
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmacy.
    Comparative analysis of the use of chemoinformatics-based and substructure-based descriptors for quantitative structure-activity relationship (QSAR) modeling2013In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 17, no 2, p. 327-341Article in journal (Refereed)
    Abstract [en]

    Quantitative structure-activity relationship (QSAR) models have gained popularity in the pharmaceutical industry due to their potential to substantially decrease drug development costs by reducing expensive laboratory and clinical tests. QSAR modeling consists of two fundamental steps, namely, descriptor discovery and model building. Descriptor discovery methods are either based on chemical domain knowledge or purely data-driven. The former, chemoinformatics-based, and the latter, substructures-based, methods for QSAR modeling, have been developed quite independently. As a consequence, evaluations involving both types of descriptor discovery method are rarely seen. In this study, a comparative analysis of chemoinformatics-based and substructure-based approaches is presented. Two chemoinformatics-based approaches; ECFI and SELMA, are compared to five approaches for substructure discovery; CP, graphSig, MFI, MoFa and SUBDUE, using 18 QSAR datasets. The empirical investigation shows that one of the chemo-informatics-based approaches, ECFI, results in significantly more accurate models compared to all other methods, when used on their own. Results from combining descriptor sets are also presented, showing that the addition of ECFI descriptors to any other descriptor set leads to improved predictive performance for that set, while the use of ECFI descriptors in many cases also can be improved by adding descriptors generated by the other methods.

  • 10. Karunaratne, Thashmee
    et al.
    Bostrom, Henrik
    Norinder, Ulf
    Uppsala universitet, Institutionen för farmaci.
    Comparative analysis of the use of chemoinformatics-based and substructure-based descriptors for quantitative structure-activity relationship (QSAR) modeling2013In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 17, no 2, p. 327-341Article in journal (Refereed)
    Abstract [en]

    Quantitative structure-activity relationship (QSAR) models have gained popularity in the pharmaceutical industry due to their potential to substantially decrease drug development costs by reducing expensive laboratory and clinical tests. QSAR modeling consists of two fundamental steps, namely, descriptor discovery and model building. Descriptor discovery methods are either based on chemical domain knowledge or purely data-driven. The former, chemoinformatics-based, and the latter, substructures-based, methods for QSAR modeling, have been developed quite independently. As a consequence, evaluations involving both types of descriptor discovery method are rarely seen. In this study, a comparative analysis of chemoinformatics-based and substructure-based approaches is presented. Two chemoinformatics-based approaches; ECFI and SELMA, are compared to five approaches for substructure discovery; CP, graphSig, MFI, MoFa and SUBDUE, using 18 QSAR datasets. The empirical investigation shows that one of the chemo-informatics-based approaches, ECFI, results in significantly more accurate models compared to all other methods, when used on their own. Results from combining descriptor sets are also presented, showing that the addition of ECFI descriptors to any other descriptor set leads to improved predictive performance for that set, while the use of ECFI descriptors in many cases also can be improved by adding descriptors generated by the other methods.

  • 11.
    Karunaratne, Thashmee
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Boström, Henrik
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Norinder, Ulf
    Comparative analysis of the use of chemoinformatics-based and substructure-based descriptors for quantitative structure-activity relationship (QSAR) modeling2013In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 17, no 2, p. 327-341Article in journal (Refereed)
    Abstract [en]

    Quantitative structure-activity relationship (QSAR) models have gained popularity in the pharmaceutical industry due to their potential to substantially decrease drug development costs by reducing expensive laboratory and clinical tests. QSAR modeling consists of two fundamental steps, namely, descriptor discovery and model building. Descriptor discovery methods are either based on chemical domain knowledge or purely data-driven. The former, chemoinformatics-based, and the latter, substructures-based, methods for QSAR modeling, have been developed quite independently. As a consequence, evaluations involving both types of descriptor discovery method are rarely seen. In this study, a comparative analysis of chemoinformatics-based and substructure-based approaches is presented. Two chemoinformatics-based approaches; ECFI and SELMA, are compared to five approaches for substructure discovery; CP, graphSig, MFI, MoFa and SUBDUE, using 18 QSAR datasets. The empirical investigation shows that one of the chemo-informatics-based approaches, ECFI, results in significantly more accurate models compared to all other methods, when used on their own. Results from combining descriptor sets are also presented, showing that the addition of ECFI descriptors to any other descriptor set leads to improved predictive performance for that set, while the use of ECFI descriptors in many cases also can be improved by adding descriptors generated by the other methods.

  • 12. Kotsifakos, Alexios
    et al.
    Athitsos, Vassilis
    Papapetrou, Panagiotis
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Query-sensitive Distance Measure Selection for Time Series Nearest Neighbor Classification2016In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 20, no 1, p. 5-27Article in journal (Refereed)
    Abstract [en]

    Many distance or similarity measures have been proposed for time series similarity search. However, none of these measures is guaranteed to be optimal when used for 1-Nearest Neighbor (NN) classification. In this paper we study the problem of selecting the most appropriate distance measure, given a pool of time series distance measures and a query, so as to perform NN classification of the query. We propose a framework for solving this problem, by identifying, given the query, the distance measure most likely to produce the correct classification result for that query. From this proposed framework, we derive three specific methods, that differ from each other in the way they estimate the probability that a distance measure correctly classifies a query object. In our experiments, our pool of measures consists of Dynamic TimeWarping (DTW), Move-Split-Merge (MSM), and Edit distance with Real Penalty (ERP). Based on experimental evaluation with 45 datasets, the best-performing of the three proposed methods provides the best results in terms of classification error rate, compared to the competitors, which include using the Cross Validation method for selecting the distance measure in each dataset, as well as using a single specific distance measure (DTW, MSM, or ERP) across all datasets.

  • 13. Lindgren, T.
    et al.
    Boström, Henrik
    Stockholms universitet, Sweden.
    Resolving rule conflicts with double induction2004In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 8, no 5, p. 457-468Article in journal (Refereed)
  • 14.
    Lindgren, Tony
    et al.
    KTH, Superseded Departments, Computer and Systems Sciences, DSV.
    Boström, H.
    KTH, Superseded Departments, Computer and Systems Sciences, DSV.
    Resolving rule conflicts with double induction2004In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 8, no 5, p. 457-468Article in journal (Refereed)
    Abstract [en]

    When applying an unordered set of classification rules, the rules may assign more than one class to a particular example. Previous methods of resolving such conflicts between rules include using the most frequent class of the examples covered by the conflicting rules (as done in CN2) and using naïve Bayes to calculate the most probable class. An alternative way of solving this problem is presented in this paper: by generating new rules from the examples covered by the conflicting rules. These newly induced rules are then used for classification. Experiments on a number of domains show that this method significantly outperforms both the CN2 approach and naïve Bayes.

  • 15.
    Löfström, Tuve
    et al.
    University of Borås, Sweden.
    Boström, Henrik
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Linusson, Henrik
    University of Borås, Sweden.
    Johansson, Ulf
    University of Borås, Sweden.
    Bias Reduction through Conditional Conformal Prediction2015In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 9, no 6, p. 1355-1375Article in journal (Refereed)
    Abstract [en]

    Conformal prediction (CP) is a relatively new framework in which predictive models output sets of predictions with a bound on the error rate, i.e., the probability of making an erroneous prediction is guaranteed to be equal to or less than a predefined significance level. Label-conditional conformal prediction (LCCP) is a specialization of the framework which gives a bound on the error rate for each individual class. For datasets with class imbalance, many learning algorithms have a tendency to predict the majority class more often than the expected relative frequency, i.e., they are biased in favor of the majority class. In this study, the class bias of standard and label-conditional conformal predictors is investigated. An empirical investigation on 32 publicly available datasets with varying degrees of class imbalance is presented. The experimental results show that CP is highly biased towards the majority class on imbalanced datasets, i.e., it can be expected to make a majority of its errors on the minority class. LCCP, on the other hand, is not biased towards the majority class. Instead, the errors are distributed between the classes almost in accordance with the prior class distribution.

  • 16.
    Löfström, Tuve
    et al.
    University of Borås, Sweden.
    Boström, Henrik
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Linusson, Henrik
    University of Borås, Sweden.
    Johansson, Ulf
    University of Borås, Sweden.
    Bias Reduction through Conditional Conformal Prediction2015In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 9, no 6, p. 1355-1375Article in journal (Refereed)
    Abstract [en]

    Conformal prediction (CP) is a relatively new framework in which predictive models output sets of predictions with a bound on the error rate, i.e., the probability of making an erroneous prediction is guaranteed to be equal to or less than a predefined significance level. Label-conditional conformal prediction (LCCP) is a specialization of the framework which gives a bound on the error rate for each individual class. For datasets with class imbalance, many learning algorithms have a tendency to predict the majority class more often than the expected relative frequency, i.e., they are biased in favor of the majority class. In this study, the class bias of standard and label-conditional conformal predictors is investigated. An empirical investigation on 32 publicly available datasets with varying degrees of class imbalance is presented. The experimental results show that CP is highly biased towards the majority class on imbalanced datasets, i.e., it can be expected to make a majority of its errors on the minority class. LCCP, on the other hand, is not biased towards the majority class. Instead, the errors are distributed between the classes almost in accordance with the prior class distribution.

  • 17.
    Löfström, Tuve
    et al.
    University of Borås, Faculty of Librarianship, Information, Education and IT.
    Boström, Henrik
    Stockholm University, Department of Computer and Systems Sciences.
    Linusson, Henrik
    University of Borås, Faculty of Librarianship, Information, Education and IT.
    Johansson, Ulf
    University of Borås, Faculty of Librarianship, Information, Education and IT.
    Bias Reduction through Conditional Conformal Prediction2015In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 19, no 6, p. 1355-1375Article in journal (Refereed)
  • 18.
    Löfström, Tuve
    et al.
    Högskolan i Borås, Akademin för bibliotek, information, pedagogik och IT.
    Boström, Henrik
    Stockholm University, Department of Computer and Systems Sciences.
    Linusson, Henrik
    Högskolan i Borås, Akademin för bibliotek, information, pedagogik och IT.
    Johansson, Ulf
    Högskolan i Borås, Akademin för bibliotek, information, pedagogik och IT.
    Bias reduction through conditional conformal prediction2015In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 19, no 6, p. 1355-1375Article in journal (Refereed)
    Abstract [en]

    Conformal prediction (CP) is a relatively new framework in which predictive models output sets of predictions with a bound on the error rate, i.e., the probability of making an erroneous prediction is guaranteed to be equal to or less than a predefined significance level. Label-conditional conformal prediction (LCCP) is a specialization of the framework which gives a bound on the error rate for each individual class. For datasets with class imbalance, many learning algorithms have a tendency to predict the majority class more often than the expected relative frequency, i.e., they are biased in favor of the majority class. In this study, the class bias of standard and label-conditional conformal predictors is investigated. An empirical investigation on 32 publicly available datasets with varying degrees of class imbalance is presented. The experimental results show that CP is highly biased towards the majority class on imbalanced datasets, i.e., it can be expected to make a majority of its errors on the minority class. LCCP, on the other hand, is not biased towards the majority class. Instead, the errors are distributed between the classes almost in accordance with the prior class distribution.

1 - 18 of 18
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf