Digitala Vetenskapliga Arkivet

Ändra sökning
Avgränsa sökresultatet
1 - 36 av 36
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Asker, Lars
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Karlsson, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Zhao, Jing
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Mining Candidates for Adverse Drug Interactions in Electronic Patient Records2014Ingår i: PETRA '14 Proceedings of the 7th International Conference on Pervasive Technologies Related to Assistive Environments, PETRA’14, New York: ACM Press, 2014Konferensbidrag (Refereegranskat)
    Abstract [en]

    Electronic patient records provide a valuable source of information for detecting adverse drug events. In this paper, we explore two different but complementary approaches to extracting useful information from electronic patient records with the goal of identifying candidate drugs, or combinations of drugs, to be further investigated for suspected adverse drug events. We propose a novel filter-and-refine approach that combines sequential pattern mining and disproportionality analysis. The proposed method is expected to identify groups of possibly interacting drugs suspected for causing certain adverse drug events. We perform an empirical investigation of the proposed method using a subset of the Stockholm electronic patient record corpus. The data used in this study consists of all diagnoses and medications for a group of patients diagnoses with at least one heart related diagnosis during the period 2008--2010. The study shows that the method indeed is able to detect combinations of drugs that occur more frequently for patients with cardiovascular diseases than for patients in a control group, providing opportunities for finding candidate drugs that cause adverse drug effects through interaction.

  • 2. Bagattini, Francesco
    et al.
    Karlsson, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Rebane, Jonathan
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    A classification framework for exploiting sparse multi-variate temporal features with application to adverse drug event detection in medical records2019Ingår i: BMC Medical Informatics and Decision Making, E-ISSN 1472-6947, Vol. 19, artikel-id 7Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Background: Adverse drug events (ADEs) as well as other preventable adverse events in the hospital setting incur a yearly monetary cost of approximately $3.5 billion, in the United States alone. Therefore, it is of paramount importance to reduce the impact and prevalence of ADEs within the healthcare sector, not only since it will result in reducing human suffering, but also as a means to substantially reduce economical strains on the healthcare system. One approach to mitigate this problem is to employ predictive models. While existing methods have been focusing on the exploitation of static features, limited attention has been given to temporal features.

    Methods: In this paper, we present a novel classification framework for detecting ADEs in complex Electronic health records (EHRs) by exploiting the temporality and sparsity of the underlying features. The proposed framework consists of three phases for transforming sparse and multi-variate time series features into a single-valued feature representation, which can then be used by any classifier. Moreover, we propose and evaluate three different strategies for leveraging feature sparsity by incorporating it into the new representation.

    Results: A large-scale evaluation on 15 ADE datasets extracted from a real-world EHR system shows that the proposed framework achieves significantly improved predictive performance compared to state-of-the-art. Moreover, our framework can reveal features that are clinically consistent with medical findings on ADE detection.

    Conclusions: Our study and experimental findings demonstrate that temporal multi-variate features of variable length and with high sparsity can be effectively utilized to predict ADEs from EHRs. Two key advantages of our framework are that it is method agnostic, i.e., versatile, and of low computational cost, i.e., fast; hence providing an important building block for future exploitation within the domain of machine learning from EHRs.

  • 3. Boström, Henrik
    et al.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Gurung, Ram B.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Karlsson, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Lindgren, Tony
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Conformal prediction using random survival forests2017Ingår i: 16th IEEE International Conference on Machine Learning and Applications: Proceedings / [ed] Xuewen Chen, Bo Luo, Feng Luo, Vasile Palade, M. Arif Wani, IEEE, 2017, s. 812-817Konferensbidrag (Refereegranskat)
    Abstract [en]

    Random survival forests constitute a robust approach to survival modeling, i.e., predicting the probability that an event will occur before or on a given point in time. Similar to most standard predictive models, no guarantee for the prediction error is provided for this model, which instead typically is empirically evaluated. Conformal prediction is a rather recent framework, which allows the error of a model to be determined by a user specified confidence level, something which is achieved by considering set rather than point predictions. The framework, which has been applied to some of the most popular classification and regression techniques, is here for the first time applied to survival modeling, through random survival forests. An empirical investigation is presented where the technique is evaluated on datasets from two real-world applications; predicting component failure in trucks using operational data and predicting survival and treatment of heart failure patients from administrative healthcare data. The experimental results show that the error levels indeed are very close to the provided confidence levels, as guaranteed by the conformal prediction framework, and that the error for predicting each outcome, i.e., event or no-event, can be controlled separately. The latter may, however, lead to less informative predictions, i.e., larger prediction sets, in case the class distribution is heavily imbalanced.

  • 4. Henelius, Andreas
    et al.
    Karlsson, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Ukkonen, Antti
    Puolamäki, Kai
    Semigeometric Tiling of Event Sequences2016Ingår i: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part I / [ed] Paolo Frasconi, Niels Landwehr, Giuseppe Manco, Jilles Vreeken, Springer, 2016, s. 329-344Konferensbidrag (Refereegranskat)
    Abstract [en]

    Event sequences are ubiquitous, e.g., in finance, medicine, and social media. Often the same underlying phenomenon, such as television advertisements during Superbowl, is reflected in independent event sequences, like different Twitter users. It is hence of interest to find combinations of temporal segments and subsets of sequences where an event of interest, like a particular hashtag, has an increased occurrence probability. Such patterns allow exploration of the event sequences in terms of their evolving temporal dynamics, and provide more fine-grained insights to the data than what for example straightforward clustering can reveal. We formulate the task of finding such patterns as a novel matrix tiling problem, and propose two algorithms for solving it. Our first algorithm is a greedy set-cover heuristic, while in the second approach we view the problem as time-series segmentation. We apply the algorithms on real and artificial datasets and obtain promising results. The software related to this paper is available at https://github.com/bwrc/semigeom-r.

  • 5. Hollmén, Jaakko
    et al.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Karlsson, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Norstedt Wikner, Birgitta
    Öhman, Inger
    Exploring epistaxis as an adverse effect of anti-thrombotic drugs and outdoor temperature2018Ingår i: Proceedings of the 11th PErvasive Technologies Related to Assistive Environments Conference (PETRA), Association for Computing Machinery (ACM), 2018, s. 1-4Konferensbidrag (Refereegranskat)
    Abstract [en]

    Electronic health records contain a wealth of epidemiological information about diseases at the population level. Using a database of medical diagnoses and drug prescriptions in electronic health records, we investigate the correlation between outdoor temperature and the incidence of epistaxis over time for two groups of patients. One group consists of patients that had been diagnosed with epistaxis and also been prescribed at least one of the three anti-thrombotic agents: Warfarin, Apixaban, or Rivaroxaban. The other group consists of patients that had been diagnosed with epistaxis and not been prescribed any of the three anti-thrombotic drugs. We find a strong negative correlation between the incidence of epistaxis and outdoor temperature for the group that had not been prescribed any of the three anti-thrombotic drugs, while there is a weaker correlation between incidence of epistaxis and outdoor temperature for the other group. It is, however, clear that both groups are affected in a similar way, such that the incidence of epistaxis increases with colder temperatures.

  • 6.
    Karlsson, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Order in the random forest2017Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
    Abstract [en]

    In many domains, repeated measurements are systematically collected to obtain the characteristics of objects or situations that evolve over time or other logical orderings. Although the classification of such data series shares many similarities with traditional multidimensional classification, inducing accurate machine learning models using traditional algorithms are typically infeasible since the order of the values must be considered.

    In this thesis, the challenges related to inducing predictive models from data series using a class of algorithms known as random forests are studied for the purpose of efficiently and effectively classifying (i) univariate, (ii) multivariate and (iii) heterogeneous data series either directly in their sequential form or indirectly as transformed to sparse and high-dimensional representations. In the thesis, methods are developed to address the challenges of (a) handling sparse and high-dimensional data, (b) data series classification and (c) early time series classification using random forests. The proposed algorithms are empirically evaluated in large-scale experiments and practically evaluated in the context of detecting adverse drug events.

    In the first part of the thesis, it is demonstrated that minor modifications to the random forest algorithm and the use of a random projection technique can improve the effectiveness of random forests when faced with discrete data series projected to sparse and high-dimensional representations. In the second part of the thesis, an algorithm for inducing random forests directly from univariate, multivariate and heterogeneous data series using phase-independent patterns is introduced and shown to be highly effective in terms of both computational and predictive performance. Then, leveraging the notion of phase-independent patterns, the random forest is extended to allow for early classification of time series and is shown to perform favorably when compared to alternatives. The conclusions of the thesis not only reaffirm the empirical effectiveness of random forests for traditional multidimensional data but also indicate that the random forest framework can, with success, be extended to sequential data representations.

    Ladda ner fulltext (pdf)
    Order in the random forest
    Ladda ner (jpg)
    Omslagsframsida
  • 7.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Handling Sparsity with Random Forests when Predicting Adverse Drug Events from Electronic Health Records2014Ingår i: IEEE International Conference on Healthcare Informatics (ICHI): Proceedings, IEEE Computer Society, 2014, s. 17-22Konferensbidrag (Refereegranskat)
    Abstract [en]

    When using electronic health record (EHR) data to build models for predicting adverse drug effects (ADEs), one is typically facing the problem of data sparsity, i.e., drugs and diagnosis codes that could be used for predicting a certain ADE are absent for most observations. For such tasks, the ability to effectively handle sparsity by the employed machine learning technique is crucial. The state-of-the-art random forest algorithm is frequently employed to handle this type of data. It has however recently been demonstrated that the algorithm is biased towards the majority class, which may result in a low predictive performance on EHR data with large numbers of sparse features. In this study, approaches to handle this problem are empirically evaluated using 14 ADE datasets and three performance metrics; F1-score, AUC and Brier score. Two resampling based techniques are investigated and compared to two baseline approaches. The experimental results indicate that, for larger forests, the resampling methods outperform the baseline approaches when considering F1-score, which is consistent with the metric being affected by class bias. The approaches perform on a similar level with respect to AUC, which can be explained by the metric not being sensitive to class bias. Finally, when considering the squared error (Brier score) of individual predictions, one of the baseline approaches turns out to be ahead of the others. A bias-variance analysis shows that this is an effect of the individual trees being more correct on average for the baseline approach and that this outweighs the expected loss from a lower variance. The main conclusion is that the suggested choice of approach to handle sparsity is highly dependent on the performance metric, or the task, of interest. If the task is to accurately assign an ADE to a patient record, a sampling based approach is recommended. If the task is to rank patients according to risk of a certain ADE, the choice of approach is of minor importance. Finally, if the task is to accurately assign probabilities for a certain ADE, then one of the baseline approaches is recommended.

  • 8.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Predicting Adverse Drug Events using Heterogeneous Event Sequences2016Ingår i: 2016 IEEE International Conference on Healthcare Informatics (ICHI), IEEE Computer Society, 2016, s. 356-362Konferensbidrag (Refereegranskat)
    Abstract [en]

    Adverse drug events (ADEs) are known to be severely under-reported in electronic health record (EHR) systems. One approach to mitigate this problem is to employ machine learning methods to detect and signal for potentially missing ADEs, with the aim of increasing reporting rates. There are, however, many challenges involved in constructing prediction models for this task, since data present in health care records is heterogeneous, high dimensional, sparse and temporal. Previous approaches typically employ bag-of-items representations of clinical events that are present in a record, ignoring the temporal aspects. In this paper, we study the problem of classifying heterogeneous and multivariate event sequences using a novel algorithm building on the well known concept of ensemble learning. The proposed approach is empirically evaluated using 27 datasets extracted from a real EHR database with different ADEs present. The results indicate that the proposed approach, which explicitly models the temporal nature of clinical data, can be expected to outperform, in terms of the trade-off between precision and specificity, models that do no consider the temporal aspects.

  • 9.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    KAPMiner: Mining Ordered Association Rules with Constraints2017Ingår i: Advances in Intelligent Data Analysis XVI: Proceedings / [ed] Niall Adams, Allan Tucker, David Weston, 2017, s. 149-161Konferensbidrag (Refereegranskat)
    Abstract [en]

    We study the problem of mining ordered association rules from event sequences. Ordered association rules differ from regular association rules in that the events occurring in the antecedent (left hand side) of the rule are temporally constrained to occur strictly before the events in the consequent (right hand side). We argue that such constraints can provide more meaningful rules in particular application domains, such as health care. The importance and interestingness of the extracted rules are quantified by adapting existing rule mining metrics. Our experimental evaluation on real data sets demonstrates the descriptive power of ordered association rules against ordinary association rules.

  • 10.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Multi-channel ECG classification using forests of randomized shapelet trees2015Ingår i: Proceedings of the 8th ACM International Conference on PErvasive Technologies Related to Assistive Environments, Association for Computing Machinery (ACM), 2015, artikel-id 43Konferensbidrag (Refereegranskat)
    Abstract [en]

    Data series of multiple channels occur at high rates and in massive quantities in several application domains, such as healthcare. In this paper, we study the problem of multi-channel ECG classification. We map this problem to multivariate data series classification and propose five methods for solving it, using a split-and-combine approach. The proposed framework is evaluated using three base-classifiers on real-world data for detecting Myocardial Infarction. Extensive experiments are performed on real ECG data extracted from the Physiobank data repository. Our findings emphasize the importance of selecting an appropriate base-classifier for multivariate data series classification, while demonstrating the superiority of the Random Shapelet Forest (0.825 accuracy) against competitor methods (0.664 accuracy for 1-NN under cDTW).

  • 11.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Persson, Hans E.
    Mining disproportional itemsets for characterizing groups of heart failure patients from administrative health records2017Ingår i: Proceedings of the 10th International Conference on PErvasive Technologies Related to Assistive Environments, Association for Computing Machinery (ACM), 2017, s. 394-398Konferensbidrag (Refereegranskat)
    Abstract [en]

    Heart failure is a serious medical conditions involving decreased quality of life and an increased risk of premature death. A recent evaluation by the Swedish National Board of Health and Welfare shows that Swedish heart failure patients are often undertreated and do not receive basic medication as recommended by the national guidelines for treatment of heart failure. The objective of this paper is to use registry data to characterize groups of heart failure patients, with an emphasis on basic treatment. Towards this end, we explore the applicability of frequent itemset mining and disproportionality analysis for finding interesting and distinctive characterizations of a target group of patients, e.g., those who have received basic treatment, against a control group, e.g., those who have not received basic treatment. Our empirical evaluation is performed on data extracted from administrative health records from the Stockholm County covering the years 2010--2016. Our findings suggest that frequency is not always the most appropriate measure of importance for frequent itemsets, while itemset disproportionality against a control group provides alternative rankings of the extracted itemsets leading to some medically intuitive characterizations of the target groups.

  • 12.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Early Random Shapelet Forest2016Ingår i: Discovery Science: 19th International Conference, DS 2016, Bari, Italy, October 19–21, 2016, Proceedings / [ed] Toon Calders, Michelangelo Ceci, Donato Malerba, Springer, 2016, s. 261-276Konferensbidrag (Refereegranskat)
    Abstract [en]

    Early classification of time series has emerged as an increasingly important and challenging problem within signal processing, especially in domains where timely decisions are critical, such as medical diagnosis in health-care. Shapelets, i.e., discriminative sub-sequences, have been proposed for time series classification as a means to capture local and phase independent information. Recently, forests of randomized shapelet trees have been shown to produce state-of-the-art predictive performance at a low computational cost. In this work, they are extended to allow for early classification of time series. An extensive empirical investigation is presented, showing that the proposed algorithm is superior to alternative state-of-the-art approaches, in case predictive performance is considered to be more important than earliness. The algorithm allows for tuning the trade-off between accuracy and earliness, thereby supporting the generation of early classifiers that can be dynamically adapted to specific needs at low computational cost.

  • 13.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Forests of Randomized Shapelet Trees2015Ingår i: Statistical Learning and Data Sciences: Proceedings / [ed] Alexander Gammerman, Vladimir Vovk, Harris Papadopoulos, Springer, 2015, s. 126-136Konferensbidrag (Refereegranskat)
    Abstract [en]

    Shapelets have recently been proposed for data series classification, due to their ability to capture phase independent and local information. Decision trees based on shapelets have been shown to provide not only interpretable models, but also, in many cases, state-of-the-art predictive performance. Shapelet discovery is however computationally costly, and although several techniques for speeding up the technique have been proposed, the computational cost is still in many cases prohibitive. In this work, an ensemble based method, referred to as Random Shapelet Forest (RSF), is proposed, which builds on the success of the random forest algorithm, and which is shown to have a lower computational complexity than the original shapelet tree learning algorithm. An extensive empirical investigation shows that the algorithm provides competitive predictive performance and that a proposed way of calculating importance scores can be used to successfully identify influential regions.

  • 14.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Generalized random shapelet forests2016Ingår i: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 30, nr 5, s. 1053-1085Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Shapelets are discriminative subsequences of time series, usually embedded in shapelet-based decision trees. The enumeration of time series shapelets is, however, computationally costly, which in addition to the inherent difficulty of the decision tree learning algorithm to effectively handle high-dimensional data, severely limits the applicability of shapelet-based decision tree learning from large (multivariate) time series databases. This paper introduces a novel tree-based ensemble method for univariate and multivariate time series classification using shapelets, called the generalized random shapelet forest algorithm. The algorithm generates a set of shapelet-based decision trees, where both the choice of instances used for building a tree and the choice of shapelets are randomized. For univariate time series, it is demonstrated through an extensive empirical investigation that the proposed algorithm yields predictive performance comparable to the current state-of-the-art and significantly outperforms several alternative algorithms, while being at least an order of magnitude faster. Similarly for multivariate time series, it is shown that the algorithm is significantly less computationally costly and more accurate than the current state-of-the-art.

  • 15.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Rebane, Jonathan
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Gionis, Aristides
    Explainable time series tweaking via irreversible and reversible temporal transformations2018Ingår i: 2018 IEEE International Conference on Data Mining (ICDM): Proceedings, IEEE, 2018, s. 207-216Konferensbidrag (Refereegranskat)
    Abstract [en]

    Time series classification has received great attention over the past decade with a wide range of methods focusing on predictive performance by exploiting various types of temporal features. Nonetheless, little emphasis has been placed on interpretability and explainability. In this paper, we formulate the novel problem of explainable time series tweaking, where, given a time series and an opaque classifier that provides a particular classification decision for the time series, we want to find the minimum number of changes to be performed to the given time series so that the classifier changes its decision to another class. We show that the problem is NP-hard, and focus on two instantiations of the problem, which we refer to as reversible and irreversible time series tweaking. The classifier under investigation is the random shapelet forest classifier. Moreover, we propose two algorithmic solutions for the two problems along with simple optimizations, as well as a baseline solution using the nearest neighbor classifier. An extensive experimental evaluation on a variety of real datasets demonstrates the usefulness and effectiveness of our problem formulation and solutions.

  • 16.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Rebane, Jonathan
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Gionis, Aristides
    Locally and globally explainable time series tweaking2020Ingår i: Knowledge and Information Systems, ISSN 0219-1377, E-ISSN 0219-3116, Vol. 62, nr 5, s. 1671-1700Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Time series classification has received great attention over the past decade with a wide range of methods focusing on predictive performance by exploiting various types of temporal features. Nonetheless, little emphasis has been placed on interpretability and explainability. In this paper, we formulate the novel problem of explainable time series tweaking, where, given a time series and an opaque classifier that provides a particular classification decision for the time series, we want to find the changes to be performed to the given time series so that the classifier changes its decision to another class. We show that the problem is NP -hard, and focus on three instantiations of the problem using global and local transformations. In the former case, we investigate the k-nearest neighbor classifier and provide an algorithmic solution to the global time series tweaking problem. In the latter case, we investigate the random shapelet forest classifier and focus on two instantiations of the local time series tweaking problem, which we refer to as reversible and irreversible time series tweaking, and propose two algorithmic solutions for the two problems along with simple optimizations. An extensive experimental evaluation on a variety of real datasets demonstrates the usefulness and effectiveness of our problem formulation and solutions.

  • 17.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Zhao, Jing
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Dimensionality Reduction with Random Indexing: An Application on Adverse Drug Event Detection using Electronic Health Records2014Ingår i: IEEE 27th International Symposium on Computer-Based Medical Systems, New York: IEEE Computer Society, 2014, s. 304-307Konferensbidrag (Refereegranskat)
    Abstract [en]

    Although electronic health records (EHRs) have recently become an important data source for drug safety signals detection, which is usually evaluated in clinical trials, the use of such data is often prohibited by dimensionality and available computer resources. Currently, several methods for reducing dimensionality are developed, used and evaluated within the medical domain. While these methods perform well, the computational cost tends to increase with growing dimensionality. An alternative solution is random indexing, a technique commonly employed in text classification to reduce the dimensionality of large and sparse documents. This study aims to explore how the predictive performance of random forest is affected by dimensionality reduction through random indexing to predict adverse drug reactions (ADEs). Data are extracted from EHRs and the task is to predict whether or not a patient should be assigned an ADE related diagnosis code. Four different dimensionality settings are investigated and their sensitivity, specificity and area under ROC curve are reported for 14 data sets. The results show that for the investigated data sets, the predictive performance is not negatively affected by dimensionality reduction, however, the computational cost is significantly reduced. Therefore, this study concludes that applying random indexing on EHR data reduces the computational cost, while retaining the predictive performance.

  • 18.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Zhao, Jing
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Predicting Adverse Drug Events by Analyzing Electronic Patient Records2013Ingår i: Artificial Intelligence in Medicine: 14th Conference on Artificial Intelligence in Medicine, AIME 2013. Proceedings / [ed] Niels Peek, Roque Marín Morales, Mor Peleg, Springer Berlin/Heidelberg, 2013, Vol. 7885, s. 125-129Konferensbidrag (Refereegranskat)
    Abstract [en]

    Diagnosis codes for adverse drug events (ADEs) are sometimes missing from electronic patient records (EPRs). This may not only affect patient safety in the worst case, but also the number of reported ADEs, resulting in incorrect risk estimates of prescribed drugs. Large databases of electronic patient records (EPRs) are potentially valuable sources of information to support the identification of ADEs. This study investigates the use of machine learning for predicting one specific ADE based on information extracted from EPRs, including age, gender, diagnoses and drugs. Several predictive models are developed and evaluated using different learning algorithms and feature sets. The highest observed AUC is 0.87, obtained by the random forest algorithm. The resulting model can be used for screening EPRs that are not, but possibly should be, assigned a diagnosis code for the ADE under consideration. Preliminary results from using the model are presented.

  • 19. Kotsifakos, Alexios
    et al.
    Karlsson, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Athitsos, Vassilis
    Gunopulos, Dimitrios
    Embedding-based subsequence matching with gaps-range-tolerances: a Query-By-Humming application2015Ingår i: The VLDB journal, ISSN 1066-8888, E-ISSN 0949-877X, Vol. 24, nr 4, s. 519-536Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We present a subsequence matching framework that allows for gaps in both query and target sequences, employs variable matching tolerance efficiently tuned for each query and target sequence, and constrains the maximum matching range. Using this framework, a dynamic programming method is proposed, called SMBGT, that, given a short query sequence Q and a large database, identifies in quadratic time the subsequence of the database that best matches Q. SMBGT is highly applicable to music retrieval. However, in Query-By-Humming applications, runtime is critical. Hence, we propose a novel embedding-based approach, called ISMBGT, for speeding up search under SMBGT. Using a set of reference sequences, ISMBGT maps both Q and each position of each database sequence into vectors. The database vectors closest to the query vector are identified, and SMBGT is then applied between Q and the subsequences that correspond to those database vectors. The key novelties of ISMBGT are that it does not require training, it is query sensitive, and it exploits the flexibility of SMBGT. We present an extensive experimental evaluation using synthetic and hummed queries on a large music database. Our findings show that ISMBGT can achieve speedups of up to an order of magnitude against brute-force search and over an order of magnitude against cDTW, while maintaining a retrieval accuracy very close to that of brute-force search.

  • 20.
    Lindgren, Tony
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Samsten, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Example-Based Feature Tweaking Using Random Forests2019Ingår i: 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science: Proceedings, IEEE, 2019, s. 53-60Konferensbidrag (Refereegranskat)
    Abstract [en]

    In certain application areas when using predictive models, it is not enough to make an accurate prediction for an example, instead it might be more important to change a prediction from an undesired class into a desired class. In this paper we investigate methods for changing predictions of examples. To this end, we introduce a novel algorithm for changing predictions of examples and we compare this novel method to an existing method and a baseline method. In an empirical evaluation we compare the three methods on a total of 22 datasets. The results show that the novel method and the baseline method can change an example from an undesired class into a desired class in more cases than the competitor method (and in some cases this difference is statistically significant). We also show that the distance, as measured by the euclidean norm, is higher for the novel and baseline methods (and in some cases this difference is statistically significantly) than for state-of-the-art. The methods and their proposed changes are also evaluated subjectively in a medical domain with interesting results.

  • 21. Mochaourab, Rami
    et al.
    Venkitaraman, Arun
    Samsten, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Rojas, Cristian R.
    Post Hoc Explainability for Time Series Classification. Toward a signal processing perspective2022Ingår i: IEEE signal processing magazine (Print), ISSN 1053-5888, E-ISSN 1558-0792, Vol. 39, nr 4, s. 119-129Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Time series data correspond to observations of phenomena that are recorded over time [1] . Such data are encountered regularly in a wide range of applications, such as speech and music recognition, monitoring health and medical diagnosis, financial analysis, motion tracking, and shape identification, to name a few. With such a diversity of applications and the large variations in their characteristics, time series classification is a complex and challenging task. One of the fundamental steps in the design of time series classifiers is that of defining or constructing the discriminant features that help differentiate between classes. This is typically achieved by designing novel representation techniques [2] that transform the raw time series data to a new data domain, where subsequently a classifier is trained on the transformed data, such as one-nearest neighbors [3] or random forests [4] . In recent time series classification approaches, deep neural network models have been employed that are able to jointly learn a representation of time series and perform classification [5] . In many of these sophisticated approaches, the discriminant features tend to be complicated to analyze and interpret, given the high degree of nonlinearity.

  • 22.
    Pilipiec, Patrick
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap. Maastricht University, The Netherlands.
    Samsten, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Bota, András
    Surveillance of communicable diseases using social media: A systematic review2023Ingår i: PLOS ONE, E-ISSN 1932-6203, Vol. 18, nr 2, artikel-id e0282101Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Background

    Communicable diseases pose a severe threat to public health and economic growth. The traditional methods that are used for public health surveillance, however, involve many drawbacks, such as being labor intensive to operate and resulting in a lag between data collection and reporting. To effectively address the limitations of these traditional methods and to mitigate the adverse effects of these diseases, a proactive and real-time public health surveillance system is needed. Previous studies have indicated the usefulness of performing text mining on social media.

    Objective

    To conduct a systematic review of the literature that used textual content published to social media for the purpose of the surveillance and prediction of communicable diseases.

    Methodology

    Broad search queries were formulated and performed in four databases. Both journal articles and conference materials were included. The quality of the studies, operationalized as reliability and validity, was assessed. This qualitative systematic review was guided by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.

    Results

    Twenty-three publications were included in this systematic review. All studies reported positive results for using textual social media content to surveille communicable diseases. Most studies used Twitter as a source for these data. Influenza was studied most frequently, while other communicable diseases received far less attention. Journal articles had a higher quality (reliability and validity) than conference papers. However, studies often failed to provide important information about procedures and implementation.

    Conclusion

    Text mining of health-related content published on social media can serve as a novel and powerful tool for the automated, real-time, and remote monitoring of public health and for the surveillance and prediction of communicable diseases in particular. This tool can address limitations related to traditional surveillance methods, and it has the potential to supplement traditional methods for public health surveillance.

  • 23.
    Rebane, Jonathan
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Karlsson, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Learning from Administrative Health Registries2017Ingår i: SoGood 2017: Data Science for Social Good: Proceedings / [ed] Ricard Gavaldà, Irena Koprinska, Stefan Kramer, CEUR-WS.org , 2017Konferensbidrag (Refereegranskat)
    Abstract [en]

    Over the last decades the healthcare domain has seen a tremendous increase and interest in methods for making inference about patient care using large quantities of medical data. Such data is often stored in electronic health records and administrative health registries. As these data sources have grown increasingly complex, with millions of patients represented by thousands of attributes, static or time evolving, finding relevant and accurate patterns that can be used for predictive or descriptive modelling is impractical for human experts. In this paper, we concentrate our review on Swedish Administrative Health Registries (AHRs) and Electronic Health Records (EHRs) and provide an overview of recent and ongoing work in the area with focus on adverse drug events (ADEs) and heart failure.

  • 24.
    Rebane, Jonathan
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Karlsson, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Bornemann, Leon
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    SMILE: A feature-based temporal abstraction framework for event-interval sequence classification2021Ingår i: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 35, nr 1, s. 372-399Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In this paper, we study the problem of classification of sequences of temporal intervals. Our main contribution is a novel framework, which we call SMILE, for extracting relevant features from interval sequences to construct classifiers.SMILE introduces the notion of utilizing random temporal abstraction features, we define as e-lets, as a means to capture information pertaining to class-discriminatory events which occur across the span of complete interval sequences. Our empirical evaluation is applied to a wide array of benchmark data sets and fourteen novel datasets for adverse drug event detection. We demonstrate how the introduction of simple sequential features, followed by progressively more complex features each improve classification performance. Importantly, this investigation demonstrates that SMILE significantly improves AUC performance over the current state-of-the-art. The investigation also reveals that the selection of underlying classification algorithm is important to achieve superior predictive performance, and how the number of features influences the performance of our framework.

  • 25.
    Rebane, Jonathan
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Karlsson, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Denic, Stojan
    Seq2Seq RNNs and ARIMA models for Cryptocurrency Prediction: A Comparative Study2018Ingår i: Proceedings of SIGKDD Workshop on Fintech (SIGKDD Fintech’18), 2018, artikel-id 4Konferensbidrag (Refereegranskat)
    Abstract [en]

    Cyrptocurrency price prediction has recently become an alluring topic, attracting massive media and investor interest. Traditional models, such as Autoregressive Integrated Moving Average models (ARIMA) and models with more modern popularity, such as Recurrent Neural Networks (RNN’s) can be considered candidates for such financial prediction problems, with RNN’s being capable of utilizing various endogenous and exogenous input sources. This study compares the model performance of ARIMA to that of a seq2seq recurrent deep multi-layer neural network (seq2seq) utilizing a varied selection of inputs types. The results demonstrate superior performance of seq2seq over ARIMA, for models generated throughout most of bitcoin price history, with additional data sources leading to better performance during less volatile price periods.

  • 26.
    Rebane, Jonathan
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Samsten, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Pantelidis, Panteleimon
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Assessing the Clinical Validity of Attention-based and SHAP Temporal Explanations for Adverse Drug Event Predictions2021Ingår i: 2021 IEEE 34th International Symposium on Computer-Based Medical Systems: Proceedings / [ed] João Rafael Almeida; Alejandro Rodríguez González; Linlin Shen; Bridget Kane; Agma Traina; Paolo Soda; José Luís Oliveira, Los Alamitos: IEEE Computer Society, 2021, s. 235-240Konferensbidrag (Refereegranskat)
    Abstract [en]

    Attention mechanisms form the basis of providing temporal explanations for a variety of state-of-the-art recurrent neural network (RNN) based architectures. However, evidence is lacking that attention mechanisms are capable of providing sufficiently valid medical explanations. In this study we focus on the quality of temporal explanations for the medical problem of adverse drug event (ADE) prediction by comparing explanations globally and locally provided by an attention-based RNN architecture against those provided by more a more basic RNN using the post-hoc SHAP framework, a popular alternative option which adheres to several desirable explainability properties. The validity of this comparison is supported by medical expert knowledge gathered for the purpose of this study. This investigation has uncovered that these explanation methods both possess appropriateness for ADE explanations and may be used complementarily, due to SHAP providing more clinically appropriate global explanations and attention mechanisms capturing more clinically appropriate local explanations. Additional feedback from medical experts reveal that SHAP may be more applicable to real-time clinical encounters, in which efficiency must be prioritised, over attention explanations which possess properties more appropriate for offline analyses.

  • 27.
    Rebane, Jonathan
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Samsten, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Exploiting complex medical data with interpretable deep learning for adverse drug event prediction2020Ingår i: Artificial Intelligence in Medicine, ISSN 0933-3657, E-ISSN 1873-2860, Vol. 109, artikel-id 101942Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    A variety of deep learning architectures have been developed for the goal of predictive modelling and knowledge extraction from medical records. Several models have placed strong emphasis on temporal attention mechanisms and decay factors as a means to include highly temporally relevant information regarding the recency of medical event occurrence while facilitating medical code-level interpretability. In this study we utilise such models with a large Electronic Patient Record (EPR) data set consisting of diagnoses, medication, and clinical text data for the purpose of adverse drug event (ADE) prediction. The first contribution of this work is an empirical evaluation of two state-of-the-art medical-code based models in terms of objective performance metrics for ADE prediction on diagnosis and medication data. Secondly, as an extension of previous work, we augment an interpretable deep learning architecture to permit numerical risk and clinical text features and demonstrate how this approach yields improved predictive performance compared to the other baselines. Finally, we assess the importance of attention mechanisms in regards to their usefulness for medical code-level and text-level interpretability, which may facilitate novel insights pertaining to the nature of ADE occurrence within the health care domain.

  • 28.
    Samsten, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    isaksamsten/wildboar: wildboar 1.0.32020Övrigt (Övrigt vetenskapligt)
  • 29.
    Svanberg, Jan
    et al.
    University of Gävle and Centre for research on Economic Relations, Sundsvall, Sweden.
    Ardeshiri, Tohid
    University of Gävle and Centre for research on Economic Relations, Sundsvall, Sweden.
    Samsten, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Öhman, Peter
    Centre for Research on Economic Relations, Mid Sweden University, Sundsvall, Sweden.
    Neidermeyer, Presha E.
    West Virginia University, Morgantown, WA, USA.
    Prediction of Controversies and Estimation of ESG Performance: An Experimental Investigation Using Machine Learning2023Ingår i: Handbook of Big Data and Analytics in Accounting and Auditing / [ed] Tarek Rana; Jan Svanberg; Peter Öhman; Alan Lowe, Springer Publishing Company , 2023, s. 65-87Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    We develop a new methodology for computing environmental, social, and governance (ESG) ratings using a mode of artificial intelligence (AI) called machine learning (ML) to make ESG more transparent. The ML algorithms anchor our rating methodology in controversies related to non-compliance with corporate social responsibility (CSR). This methodology is consistent with the information needs of institutional investors and is the first ESG methodology with predictive validity. Our best model predicts what companies are likely to experience controversies. It has a precision of 70–84 per cent and high predictive performance on several measures. It also provides evidence of what indicators contribute the most to the predicted likelihood of experiencing an ESG controversy. Furthermore, while the common approach of rating companies is to aggregate indicators using the arithmetic average, which is a simple explanatory model designed to describe an average company, the proposed rating methodology uses state-of-the-art AI technology to aggregate ESG indicators into holistic ratings for the predictive modelling of individual company performance.

    Predictive modelling using ML enables our models to aggregate the information contained in ESG indicators with far less information loss than with the predominant aggregation method.

  • 30.
    Svanberg, Jan
    et al.
    University of Gävle, Gävle, Sweden.
    Ardeshiri, Tohid
    University of Gävle, Gävle, Sweden.
    Samsten, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Öhman, Peter
    Mid Sweden University, Sundsvall, Sweden.
    Neidermeyer, Presha E.
    West Virginia University, Morgantown, USA.
    Rana, Tarek
    The Royal Melbourne Institute of Technology, Melbourne, Australia.
    Maisano, Frank
    The Royal Melbourne Institute of Technology, Melbourne, Australia.
    Danielson, Mats
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Must social performance ratings be idiosyncratic? An exploration of social performance ratings with predictive validity2023Ingår i: Sustainability Accounting, Management and Policy Journal, ISSN 2040-8021, E-ISSN 2040-803X, Vol. 14, nr 7, s. 313-348Artikel i tidskrift (Refereegranskat)
    Abstract [sv]

    Syftet med denna studie är att utveckla en metod för att bedöma social prestation. Traditionellt använder leverantörer av miljö, social och styrning (ESG) subjektivt viktade aritmetiska medelvärden för att kombinera en uppsättning sociala prestationsindikatorer (SP) till en enda värdering. För att övervinna detta problem undersöker denna studie förutsättningarna för en ny metodik för att klassificera SP-komponenten i ESG genom att tillämpa maskininlärning (ML) och artificiell intelligens (AI) förankrade i sociala kontroverser.

    Den här studien föreslår användningen av en datadriven klassificeringsmetodik som härleder den relativa betydelsen av SP-egenskaper från deras bidrag till förutsägelsen av sociala kontroverser. Författarna använder den föreslagna metoden för att lösa viktningsproblemet med övergripande ESG-betyg och ytterligare undersöka om förutsägelse är möjlig.

    Författarna finner att ML-modeller kan förutsäga kontroverser med hög prediktiv prestanda och validitet. Resultaten tyder på att viktningsproblemet med ESG-betygen kan lösas med ett datadrivet tillvägagångssätt. Den avgörande förutsättningen för den föreslagna ratingmetodiken är dock att sociala kontroverser förutsägs av en bred uppsättning SP-indikatorer. Resultaten tyder också på att prediktivt giltiga betyg kan utvecklas med denna ML-baserade AI-metod.

    Praktiska konsekvenser

    Denna studie erbjuder praktiska lösningar på ESG-ratingproblem som har konsekvenser för investerare, ESG-bedömare och socialt ansvarsfulla investeringar.

    Den föreslagna ML-baserade AI-metoden kan bidra till att uppnå bättre ESG-betyg, vilket i sin tur kommer att bidra till att förbättra SP, vilket får konsekvenser för organisationer och samhällen genom hållbar utveckling.

    Så vitt författarna vet är denna forskning en av de första studierna som erbjuder en unik metod för att ta itu med ESG-betygsproblemet och förbättra hållbarheten genom att fokusera på SP-indikatorer.

  • 31.
    Svanberg, Jan
    et al.
    University of Gävle, Sweden; The Royal Melbourne Institute of Technology, Australia.
    Ardeshiri, Tohid
    University of Gävle, Sweden.
    Samsten, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Öhman, Peter
    Mid Sweden University, Sweden.
    Neidermeyer, Presha E.
    West Virginia University, USA.
    Rana, Tarek
    The Royal Melbourne Institute of Technology, Australia.
    Semenova, Natalia
    Linnaeus University, Växjö, Sweden.
    Danielson, Mats
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap. International Institute for Applied Systems Analysis (IIASA), Austria.
    Corporate governance performance ratings with machine learning2022Ingår i: International Journal of Intelligent Systems in Accounting, Finance & Management, ISSN 1055-615X, E-ISSN 1099-1174, Vol. 29, nr 1, s. 50-68Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We use machine learning with a cross-sectional research design to predict governance controversies and to develop a measure of the governance component of the environmental, social, governance (ESG) metrics. Based on comprehensive governance data from 2,517 companies over a period of 10 years and investigating nine machine-learning algorithms, we find that governance controversies can be predicted with high predictive performance. Our proposed governance rating methodology has two unique advantages compared with traditional ESG ratings: it rates companies' compliance with governance responsibilities and it has predictive validity. Our study demonstrates a solution to what is likely the greatest challenge for the finance industry today: how to assess a company's sustainability with validity and accuracy. Prior to this study, the ESG rating industry and the literature have not provided evidence that widely adopted governance ratings are valid. This study describes the only methodology for developing governance performance ratings based on companies' compliance with governance responsibilities and for which there is evidence of predictive validity.

  • 32. Svanberg, Jan
    et al.
    Ardeshiri, Tohid
    Samsten, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Öhman, Peter
    Rana, Tarek
    Danielson, Mats
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Prediction of environmental controversies and development of a corporate environmental performance rating methodology2022Ingår i: Journal of Cleaner Production, ISSN 0959-6526, E-ISSN 1879-1786, Vol. 344, artikel-id 130979Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Institutional investors seek to make environmentally sustainable investments using environment, social, governance (ESG) ratings. Current ESG ratings have limited validity because they are based on idiosyncratic scores derived using subjective, discretionary methodologies. We discuss a new direction for developing corporate environmental performance (CEP) ratings and propose a solution to the limited validity problem by anchoring such ratings in environmental controversies. The study uses a novel machine learning approach to make the ratings more comprehensive and transparent, based on a set of algorithmic approaches that handle nonlinearity when aggregating ESG indicators. This approach minimizes the rater subjectivity and preferences inherent in traditional ESG indicators. The findings indicate that controversies as proxies for non-compliance with environmental responsibilities can be predicted well. We conclude that environmental performance ratings developed using our machine learning framework offer predictive validity consistent with institutional investors' demand for socially responsible investment screening.

  • 33.
    Wang, Zhendong
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Miliou, Ioanna
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Samsten, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Counterfactual Explanations for Time Series Forecasting2024Ingår i: 2023 IEEE International Conference on Data Mining (ICDM), IEEE conference proceedings , 2024, s. 1391-1396Konferensbidrag (Refereegranskat)
    Abstract [en]

    Among recent developments in time series forecasting methods, deep forecasting models have gained popularity as they can utilize hidden feature patterns in time series to improve forecasting performance. Nevertheless, the majority of current deep forecasting models are opaque, hence making it challenging to interpret the results. While counterfactual explanations have been extensively employed as a post-hoc approach for explaining classification models, their application to forecasting models still remains underexplored. In this paper, we formulate the novel problem of counterfactual generation for time series forecasting, and propose an algorithm, called ForecastCF, that solves the problem by applying gradient-based perturbations to the original time series. The perturbations are further guided by imposing constraints to the forecasted values. We experimentally evaluate ForecastCF using four state-of-the-art deep model architectures and compare to two baselines. ForecastCF outperforms the baselines in terms of counterfactual validity and data manifold closeness, while generating meaningful and relevant counterfactuals for various forecasting tasks.

  • 34.
    Wang, Zhendong
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Samsten, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Kougia, Vasiliki
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap. University of Vienna, Vienna, Austria.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Style-transfer counterfactual explanations: An application to mortality prevention of ICU patients2023Ingår i: Artificial Intelligence in Medicine, ISSN 0933-3657, E-ISSN 1873-2860, Vol. 135, artikel-id 102457Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In recent years, machine learning methods have been rapidly adopted in the medical domain. However, current state-of-the-art medical mining methods usually produce opaque, black-box models. To address the lack of model transparency, substantial attention has been given to developing interpretable machine learning models. In the medical domain, counterfactuals can provide example-based explanations for predictions, and show practitioners the modifications required to change a prediction from an undesired to a desired state. In this paper, we propose a counterfactual solution MedSeqCF for preventing the mortality of three cohorts of ICU patients, by representing their electronic health records as medical event sequences, and generating counterfactuals by adopting and employing a text style-transfer technique. We propose three model augmentations for MedSeqCF to integrate additional medical knowledge for generating more trustworthy counterfactuals. Experimental results on the MIMIC-III dataset strongly suggest that augmented style-transfer methods can be effectively adapted for the problem of counterfactual explanations in healthcare applications and can further improve the model performance in terms of validity, BLEU-4, local outlier factor, and edit distance. In addition, our qualitative analysis of the results by consultation with medical experts suggests that our style-transfer solutions can generate clinically relevant and actionable counterfactual explanations.

  • 35.
    Wang, Zhendong
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Samsten, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Mochaourab, Rami
    RISE Research Institutes of Sweden, Sweden.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Learning Time Series Counterfactuals via Latent Space Representations2021Ingår i: Discovery Science: 24th International Conference, DS 2021, Halifax, NS, Canada, October 11–13, 2021, Proceedings / [ed] Carlos Soares; Luis Torgo, Springer , 2021, s. 369-384Konferensbidrag (Refereegranskat)
    Abstract [en]

    Counterfactual explanations can provide sample-based explanations of features required to modify from the original sample to change the classification result from an undesired state to a desired state; hence it provides interpretability of the model. Previous work of LatentCF presents an algorithm for image data that employs auto-encoder models to directly transform original samples into counterfactuals in a latent space representation. In our paper, we adapt the approach to time series classification and propose an improved algorithm named LatentCF++ which introduces additional constraints in the counterfactual generation process. We conduct an extensive experiment on a total of 40 datasets from the UCR archive, comparing to current state-of-the-art methods. Based on our evaluation metrics, we show that the LatentCF++ framework can with high probability generate valid counterfactuals and achieve comparable explanations to current state-of-the-art. Our proposed approach can also generate counterfactuals that are considerably closer to the decision boundary in terms of margin difference.

  • 36.
    Wang, Zhendong
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Samsten, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Counterfactual Explanations for Survival Prediction of Cardiovascular ICU Patients2021Ingår i: Artificial Intelligence in Medicine: 19th International Conference on Artificial Intelligence in Medicine, AIME 2021, Virtual Event, June 15–18, 2021, Proceedings / [ed] Allan Tucker; Pedro Henriques Abreu; Jaime Cardoso; Pedro Pereira Rodrigues; David Riaño, Cham: Springer, 2021, s. 338-348Konferensbidrag (Refereegranskat)
    Abstract [en]

    In recent years, machine learning methods have been rapidly implemented in the medical domain. However, current state-of-the-art methods usually produce opaque, black-box models. To address the lack of model transparency, substantial attention has been given to develop interpretable machine learning methods. In the medical domain, counterfactuals can provide example-based explanations for predictions, and show practitioners the modifications required to change a prediction from an undesired to a desired state. In this paper, we propose a counterfactual explanation solution for predicting the survival of cardiovascular ICU patients, by representing their electronic health record as a sequence of medical events, and generating counterfactuals by adopting and employing a text style-transfer technique. Experimental results on the MIMIC-III dataset strongly suggest that text style-transfer methods can be effectively adapted for the problem of counterfactual explanations in healthcare applications and can achieve competitive performance in terms of counterfactual validity, BLEU-4 and local outlier metrics. 

1 - 36 av 36
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf