Digitala Vetenskapliga Arkivet

Ändra sökning
Avgränsa sökresultatet
123 1 - 50 av 106
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Afzaal, Muhammad
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Nouri, Jalal
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Aayesha, Aayesha
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Fors, Uno
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Wu, Yongchao
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Li, Xiu
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Weegar, Rebecka
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Automatic and Intelligent Recommendations to Support Students’ Self-Regulation2021Ingår i: International Conference on Advanced Learning Technologies (ICALT),, 2021, s. 336-338Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper, we propose a counterfactual explanations-based approach to provide an automatic and intelligent recommendation that supports student's self-regulation of learning in a data-driven manner, aiming to improve their performance in courses. Existing work under the fields of learning analytics and AI in education predict students' performance and use the prediction outcome as feedback without explaining the reasons behind the prediction. Our proposed approach developed an algorithm that explains the root causes behind student's performance decline and generates data-driven recommendations for action. The effectiveness of the proposed predictive model that constitutes the intelligent recommendations is evaluated, with results demonstrating high accuracy.

  • 2.
    Afzaal, Muhammad
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Nouri, Jalal
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Aayesha, Aayesha
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Fors, Uno
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Wu, Yongchao
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Li, Xiu
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Weegar, Rebecka
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Generation of Automatic Data-Driven Feedback to Students Using Explainable Machine Learning2021Ingår i: Artificial Intelligence in Education: 22nd International Conference, AIED 2021, Utrecht, The Netherlands, June 14–18, 2021, Proceedings, Part II / [ed] Ido Roll; Danielle McNamara; Sergey Sosnovsky; Rose Luckin; Vania Dimitrova, Springer , 2021, s. 37-42Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper proposes a novel approach that employs learning analytics techniques combined with explainable machine learning to provide automatic and intelligent actionable feedback that supports students self-regulation of learning in a data-driven manner. Prior studies within the field of learning analytics predict students’ performance and use the prediction status as feedback without explaining the reasons behind the prediction. Our proposed method, which has been developed based on LMS data from a university course, extends this approach by explaining the root causes of the predictions and automatically provides data-driven recommendations for action. The underlying predictive model effectiveness of the proposed approach is evaluated, with the results demonstrating 90 per cent accuracy.

  • 3.
    Afzaal, Muhammad
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Nouri, Jalal
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Zia, Aayesha
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Fors, Uno
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Wu, Yongchao
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Li, Xiu
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Weegar, Rebecka
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Explainable AI for Data-Driven Feedback and Intelligent Action Recommendations to Support Students Self-Regulation2021Ingår i: Frontiers in Artificial Intelligence, E-ISSN 2624-8212, Vol. 4, artikel-id 723447Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Formative feedback has long been recognised as an effective tool for student learning, and researchers have investigated the subject for decades. However, the actual implementation of formative feedback practices is associated with significant challenges because it is highly time-consuming for teachers to analyse students’ behaviours and to formulate and deliver effective feedback and action recommendations to support students’ regulation of learning. This paper proposes a novel approach that employs learning analytics techniques combined with explainable machine learning to provide automatic and intelligent feedback and action recommendations that support student’s self-regulation in a data-driven manner, aiming to improve their performance in courses. Prior studies within the field of learning analytics have predicted students’ performance and have used the prediction status as feedback without explaining the reasons behind the prediction. Our proposed method, which has been developed based on LMS data from a university course, extends this approach by explaining the root causes of the predictions and by automatically providing data-driven intelligent recommendations for action. Based on the proposed explainable machine learning-based approach, a dashboard that provides data-driven feedback and intelligent course action recommendations to students is developed, tested and evaluated. Based on such an evaluation, we identify and discuss the utility and limitations of the developed dashboard. According to the findings of the conducted evaluation, the dashboard improved students’ learning outcomes, assisted them in self-regulation and had a positive effect on their motivation.

    Ladda ner fulltext (pdf)
    fulltext
  • 4.
    Allaart, Corinne
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap. Karolinska Institute, Sweden.
    Mondrejevski, Lena
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap. Karolinska Institute, Sweden.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    FISUL: A Framework for Detecting Adverse Drug Events from Heterogeneous Medical Sources Using Feature Importance2019Ingår i: Artificial Intelligence Applications and Innovations: Proceedings / [ed] John MacIntyre, Ilias Maglogiannis, Lazaros Iliadis, Elias Pimenidis, Springer, 2019, s. 139-151Konferensbidrag (Refereegranskat)
    Abstract [en]

    Adverse drug events (ADEs) are considered to be highly important and critical conditions, while accounting for around 3.7% of hospital admissions all over the world. Several studies have applied predictive models for ADE detection; nonetheless, only a restricted number and type of features has been used. In the paper, we propose a framework for identifying ADEs in medical records, by first applying the Boruta feature importance criterion, and then using the top-ranked features for building a predictive model as well as for clustering. We provide an experimental evaluation on the MIMIC-III database by considering 7 types of ADEs illustrating the benefit of the Boruta criterion for the task of ADE detection.

  • 5.
    Asker, Lars
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Karlsson, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Zhao, Jing
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Mining Candidates for Adverse Drug Interactions in Electronic Patient Records2014Ingår i: PETRA '14 Proceedings of the 7th International Conference on Pervasive Technologies Related to Assistive Environments, PETRA’14, New York: ACM Press, 2014Konferensbidrag (Refereegranskat)
    Abstract [en]

    Electronic patient records provide a valuable source of information for detecting adverse drug events. In this paper, we explore two different but complementary approaches to extracting useful information from electronic patient records with the goal of identifying candidate drugs, or combinations of drugs, to be further investigated for suspected adverse drug events. We propose a novel filter-and-refine approach that combines sequential pattern mining and disproportionality analysis. The proposed method is expected to identify groups of possibly interacting drugs suspected for causing certain adverse drug events. We perform an empirical investigation of the proposed method using a subset of the Stockholm electronic patient record corpus. The data used in this study consists of all diagnoses and medications for a group of patients diagnoses with at least one heart related diagnosis during the period 2008--2010. The study shows that the method indeed is able to detect combinations of drugs that occur more frequently for patients with cardiovascular diseases than for patients in a control group, providing opportunities for finding candidate drugs that cause adverse drug effects through interaction.

  • 6.
    Asker, Lars
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Persson, Hans
    Identifying Factors for the Effectiveness of Treatment of Heart Failure: A Registry Study2016Ingår i: IEEE 29th International Symposiumon Computer-Based Medical Systems: CBMS 2016, IEEE Computer Society, 2016, s. 205-206Konferensbidrag (Refereegranskat)
    Abstract [en]

    An administrative health register containing health care data for over 2 million patients will be used to search for factors that can affect the treatment of heart failure. In the study, we will measure the effects of employed treatment for various groups of heart failure patients, using different measures of effectiveness. Significant deviations in effectiveness of treatments of the various patient groups will be reported and factors that may help explaining the effect of treatment will be analyzed. Identification of the most important factors that may help explain the observed deviations between the different groups will be derived through generation of predictive models, for which variable importance can be calculated. The findings may affect recommended treatments as well as high-lighting deviations from national guidelines.

  • 7.
    Asker, Lars
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Learning from Swedish Healthcare Data2016Ingår i: Proceedings of the 9th ACM International Conference on PErvasive Technologies Related to Assistive Environments, Association for Computing Machinery (ACM), 2016, artikel-id 47Konferensbidrag (Refereegranskat)
    Abstract [en]

    We present two ongoing projects aimed at learning from health care records. The first project, DADEL, is focusing on high-performance data mining for detrecting adverse drug events in healthcare, and uses electronic patient records covering seven years of patient record data from the Stockholm region in Sweden. The second project is focusing on heart failure and on understanding the differences in treatment between various groups of patients. It uses a Swedish administrative health register containing health care data for over two million patients.

  • 8.
    Azari, Amin
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Denic, Stojan
    Peters, Gunnar
    Cellular Traffic Prediction and Classification: A Comparative Evaluation of LSTM and ARIMA2019Ingår i: Discovery Science: Proceedings / [ed] Petra Kralj Novak, Tomislav, Šmuc, Sašo Džeroski, Springer, 2019, s. 129-144Konferensbidrag (Refereegranskat)
    Abstract [en]

    Prediction of user traffic in cellular networks has attracted profound attention for improving the reliability and efficiency of network resource utilization. In this paper, we study the problem of cellular network traffic prediction and classification by employing standard machine learning and statistical learning time series prediction methods, including long short-term memory (LSTM) and autoregressive integrated moving average (ARIMA), respectively. We present an extensive experimental evaluation of the designed tools over a real network traffic dataset. Within this analysis, we explore the impact of different parameters on the effectiveness of the predictions. We further extend our analysis to the problem of network traffic classification and prediction of traffic bursts. The results, on the one hand, demonstrate the superior performance of LSTM over ARIMA in general, especially when the length of the training dataset is large enough and its granularity is fine enough. On the other hand, the results shed light onto the circumstances in which, ARIMA performs close to the optimal with lower complexity.

  • 9.
    Azari, Amin
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Denic, Stojan
    Petters, Gunnar
    User Traffic Prediction for Proactive Resource Management: Learning-Powered Approaches2020Ingår i: IEEE Global Communications Conference (GLOBECOM), IEEE, 2020, s. 1-6Konferensbidrag (Refereegranskat)
    Abstract [en]

    Traffic prediction plays a vital role in efficient planning and usage of network resources in wireless networks. While traffic prediction in wired networks is an established field, there is a lack of research on the analysis of traffic in cellular networks, especially in a content-blind manner at the user level. Here, we shed light into this problem by designing traffic prediction tools that employ either statistical, rule-based, or deep machine learning methods. First, we present an extensive experimental evaluation of the designed tools over a real traffic dataset. Within this analysis, the impact of different parameters, such as length of prediction, feature set used in analyses, and granularity of data, on accuracy of prediction are investigated. Second, regarding the coupling observed between behavior of traffic and its generating application, we extend our analysis to the blind classification of applications generating the traffic based on the statistics of traffic arrival/departure. The results demonstrate presence of a threshold number of previous observations, beyond which, deep machine learning can outperform linear statistical learning, and before which, statistical learning outperforms deep learning approaches. Further analysis of this threshold value represents a strong coupling between this threshold, the length of future prediction, and the feature set in use. Finally, through a case study, we present how the experienced delay could be decreased by traffic arrival prediction.

  • 10. Azari, Amin
    et al.
    Salehi, Fateme
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Cavdar, Cicek
    Energy and Resource Efficiency by User Traffic Prediction and Classification in Cellular Networks2022Ingår i: IEEE Transactions on Green Communications and Networking (ITGCN), E-ISSN 2473-2400, Vol. 6, nr 2, s. 1082-1095Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    There is a lack of research on the analysis of peruser traffic in cellular networks, for deriving and following traffic-aware network management. In fact, the legacy design approach, in which resource provisioning and operation control are performed based on the cell-aggregated traffic scenarios, are not so energy- and cost-efficient and need to be substituted with user-centric predictive analysis of mobile network traffic and proactive network resource management. Here, we shed light on this problem by designing traffic prediction tools that utilize standard machine learning (ML) tools, including long shortterm memory (LSTM) and autoregressive integrated moving average (ARIMA) on top of per-user data. We present an expansive empirical evaluation of the designed solutions over a real network traffic dataset. Within this analysis, the impact of different parameters, such as the time granularity, the length of future predictions, and feature selection are investigated. As a potential application of these solutions, we present an ML-powered Discontinuous reception (DRX) scheme for energy saving. Towards this end, we leverage the derived ML models for dynamic DRX parameter adaptation to user traffic. The performance evaluation results demonstrate the superiority of LSTM over ARIMA in general, especially when the length of the training time series is high enough, and it is augmented by a wisely-selected set of features. Furthermore, the results show that adaptation of DRX parameters by online prediction of future traffic provides much more energy-saving at low latency cost in comparison with the legacy cell-wide DRX parameter adaptation.

  • 11. Bagattini, Francesco
    et al.
    Karlsson, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Rebane, Jonathan
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    A classification framework for exploiting sparse multi-variate temporal features with application to adverse drug event detection in medical records2019Ingår i: BMC Medical Informatics and Decision Making, E-ISSN 1472-6947, Vol. 19, artikel-id 7Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Background: Adverse drug events (ADEs) as well as other preventable adverse events in the hospital setting incur a yearly monetary cost of approximately $3.5 billion, in the United States alone. Therefore, it is of paramount importance to reduce the impact and prevalence of ADEs within the healthcare sector, not only since it will result in reducing human suffering, but also as a means to substantially reduce economical strains on the healthcare system. One approach to mitigate this problem is to employ predictive models. While existing methods have been focusing on the exploitation of static features, limited attention has been given to temporal features.

    Methods: In this paper, we present a novel classification framework for detecting ADEs in complex Electronic health records (EHRs) by exploiting the temporality and sparsity of the underlying features. The proposed framework consists of three phases for transforming sparse and multi-variate time series features into a single-valued feature representation, which can then be used by any classifier. Moreover, we propose and evaluate three different strategies for leveraging feature sparsity by incorporating it into the new representation.

    Results: A large-scale evaluation on 15 ADE datasets extracted from a real-world EHR system shows that the proposed framework achieves significantly improved predictive performance compared to state-of-the-art. Moreover, our framework can reveal features that are clinically consistent with medical findings on ADE detection.

    Conclusions: Our study and experimental findings demonstrate that temporal multi-variate features of variable length and with high sparsity can be effectively utilized to predict ADEs from EHRs. Two key advantages of our framework are that it is method agnostic, i.e., versatile, and of low computational cost, i.e., fast; hence providing an important building block for future exploitation within the domain of machine learning from EHRs.

  • 12.
    Bampa, Maria
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Fasth, Tobias
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Magnússon, Sindri
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    EpidRLearn: Learning Intervention Strategies for Epidemics with Reinforcement Learning2022Ingår i: Artificial Intelligence in Medicine: 20th International Conference on Artificial Intelligence in Medicine, AIME 2022, Halifax, NS, Canada, June 14–17, 2022, Proceedings / [ed] Martin Michalowski; Syed Sibte Raza Abidi; Samina Abidi, Springer Nature , 2022, s. 189-199Konferensbidrag (Refereegranskat)
    Abstract [en]

    Epidemics of infectious diseases can pose a serious threat to public health and the global economy. Despite scientific advances, containment and mitigation of infectious diseases remain a challenging task. In this paper, we investigate the potential of reinforcement learning as a decision making tool for epidemic control by constructing a deep Reinforcement Learning simulator, called EpidRLearn, composed of a contact-based, age-structured extension of the SEIR compartmental model, referred to as C-SEIR. We evaluate EpidRLearn by comparing the learned policies to two deterministic policy baselines. We further assess our reward function by integrating an alternative reward into our deep RL model. The experimental evaluation indicates that deep reinforcement learning has the potential of learning useful policies under complex epidemiological models and large state spaces for the mitigation of infectious diseases, with a focus on COVID-19.

  • 13.
    Bampa, Maria
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Aggregate-Eliminate-Predict: Detecting Adverse Drug Events from Heterogeneous Electronic Health Records2019Manuskript (preprint) (Övrigt vetenskapligt)
    Abstract [en]

    We study the problem of detecting adverse drug events in electronic healthcare records. The challenge in this work is to aggregate heterogeneous data types involving diagnosis codes, drug codes, as well as lab measurements. An earlier framework proposed for the same problem demonstrated promising predictive performance for the random forest classifier by using only lab measurements as data features. We extend this framework, by additionally including diagnosis and drug prescription codes, concurrently. In addition, we employ a recursive feature selection mechanism on top, that extracts the top-k most important features. Our experimental evaluation on five medical datasets of adverse drug events and six different classifiers, suggests that the integration of these additional features provides substantial and statistically significant improvements in terms of AUC, while employing medically relevant features.

  • 14.
    Bampa, Maria
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Mining Adverse Drug Events Using Multiple Feature Hierarchies and Patient History Windows2019Ingår i: 19th IEEE International Conference on Data Mining Workshops: Proceedings / [ed] Panagiotis Papapetrou, Xueqi Cheng, Qing He, IEEE, 2019Konferensbidrag (Refereegranskat)
    Abstract [en]

    We study the problem of detecting adverse drug events in electronic health records. The challenge is this work is to aggregate heterogeneous data types involving lab measurements, diagnoses codes and medications codes. An earlier framework proposed for the same problem demonstrated promising predictive performance for the random forest classifier by using only lab measurements as data features. We extend this framework, by additionally including diagnosis and drug prescription codes, concurrently. In addition, we employ the concept of hierarchies of clinical codes as proposed by another work, in order to exploit the inherently complex nature of the medical data. Moreover, we extended the state-of-the-art by considering variable patient history lengths before the occurrence of an ADE event rather than a patient history of an arbitrary length. Our experimental evaluation on eight medical datasets of adverse drug events, five different patient history lengths, and six different classifiers, suggests that the integration of these additional features on the different window lengths provides significant improvements in terms of AUC while employing medically relevant features.

  • 15.
    Bampa, Maria
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Hollmén, Jaakko
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    A clustering framework for patient phenotyping with application to adverse drug events2020Ingår i: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), IEEE, 2020, s. 177-182Konferensbidrag (Refereegranskat)
    Abstract [en]

    We present a clustering framework for identifying patient groups with Adverse Drug Reactions from Electronic Health Records (EHRs). The increased adoption of EHRs has brought changes in the way drug safety surveillance is carried out and plays an important role in effective drug regulation. Unsupervised machine learning methods using EHRs as their input can identify patients that share common meaningful information, without the need for expert input. In this work, we propose a generalized framework that exploits the strengths of different clustering algorithms and via clustering aggregation identifies consensus patient cluster profiles. Moreover, the inherent hierarchical structure of diagnoses and medication codes is exploited. We assess the statistical significance of the produced clusterings by applying a randomization technique that keeps the data distribution margins fixed, as we are interested in evaluating information that is not conveyed by the marginal distributions. The experimental findings suggest that the framework produces medically meaningful patient groups with regard to adverse drug events by investigating two use-cases, i.e., aplastic anaemia and drug-induced skin eruption.

  • 16. Barr Kumarakulasinghe, Nesaretnam
    et al.
    Blomberg, Tobias
    Liu, Jintai
    Saraiva Leao, Alexandra
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Evaluating local interpretable model-agnostic explanations on clinical machine learning classification models2020Ingår i: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), IEEE, 2020, s. 7-12Konferensbidrag (Refereegranskat)
    Abstract [en]

    The usage of black-box classification models within the healthcare field is highly dependent on being interpretable by the receiver. Local Interpretable Model-Agnostic Explanation (LIME) provides a patient-specific explanation for a given classification, thus enhancing the possibility for any complex classifier to serve as a safety aid within a clinical setting. However, research on if the explanation provided by LIME is relevant to clinicians is limited and there is no current framework for how an evaluation of LIME is to be performed. To evaluate the clinical relevance of the explanations provided by LIME, this study has investigated how physician's independent explanations for classified observations compare with explanations provided by LIME. Additionally, the clinical relevance and the experienced reliance on the explanations provided by LIME have been evaluated by field experts. The results indicate that the explanation provided by LIME is clinically relevant and has a very high concordance with the explanations provided by physicians. Furthermore, trust and reliance on LIME are fairly high amongst clinicians. The study proposes a framework for further research within the area.

  • 17.
    Bork, Dominik
    et al.
    TU Wien, Vienna, Austria.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Zdravkovic, Jelena
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Enterprise Modeling for Machine Learning: Case-Based Analysis and Initial Framework Proposal2023Ingår i: Research Challenges in Information Science: Information Science and the Connected World: 17th International Conference, RCIS 2023, Corfu, Greece, May 23–26, 2023, Proceedings / [ed] Selmin Nurcan; Andreas L. Opdahl; Haralambos Mouratidis; Aggeliki Tsohou, Springer , 2023, s. 518-525Konferensbidrag (Refereegranskat)
    Abstract [en]

    Artificial Intelligence (AI) continuously paves its way into even the most traditional business domains. This particularly applies to data-driven AI, like machine learning (ML). Several data-driven approaches like CRISP-DM and KKD exist that help develop and engineer new ML-enhanced solutions. A new breed of approaches, often called canvas-driven or visual ideation approaches, extend the scope by a perspective on the business value an ML-enhanced solution shall enable. In this paper, we reflect on two recent ML projects. We show that the data-driven and canvas-driven approaches cover only some necessary information for developing and operating ML-enhanced solutions. Consequently, we propose to put ML into an enterprise context for which we sketch a first framework and spark the role enterprise modeling can play.

  • 18. Bornemann, Leon
    et al.
    Lecerf, Jason
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    STIFE: A Framework for Feature-based Classification of Sequences of Temporal Intervals2016Ingår i: Discovery Science: 19th International Conference, DS 2016, Bari, Italy, October 19–21, 2016, Proceedings / [ed] Toon Calders, Michelangelo Ceci, Donato Malerba, Springer, 2016, s. 85-100Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper, we study the problem of classification of sequences of temporal intervals. Our main contribution is the STIFE framework for extracting relevant features from interval sequences to build feature-based classifiers. STIFE uses a combination of basic static metrics, shapelet discovery and selection, as well as distance-based approaches. Additionally, we propose an improved way of computing the state of the art IBSM distance measure between two interval sequences, that reduces both runtime and memory needs from pseudo-polynomial to fully polynomial, which greatly reduces the runtime of distance based classification approaches. Our empirical evaluation not only shows that STIFE provides a very fast classification time in all evaluated scenarios but also reveals that a random forests using STIFE achieves similar or better accuracy than the state of the art k-NN classifier.

  • 19. Boström, Henrik
    et al.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Gurung, Ram B.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Karlsson, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Lindgren, Tony
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Conformal prediction using random survival forests2017Ingår i: 16th IEEE International Conference on Machine Learning and Applications: Proceedings / [ed] Xuewen Chen, Bo Luo, Feng Luo, Vasile Palade, M. Arif Wani, IEEE, 2017, s. 812-817Konferensbidrag (Refereegranskat)
    Abstract [en]

    Random survival forests constitute a robust approach to survival modeling, i.e., predicting the probability that an event will occur before or on a given point in time. Similar to most standard predictive models, no guarantee for the prediction error is provided for this model, which instead typically is empirically evaluated. Conformal prediction is a rather recent framework, which allows the error of a model to be determined by a user specified confidence level, something which is achieved by considering set rather than point predictions. The framework, which has been applied to some of the most popular classification and regression techniques, is here for the first time applied to survival modeling, through random survival forests. An empirical investigation is presented where the technique is evaluated on datasets from two real-world applications; predicting component failure in trucks using operational data and predicting survival and treatment of heart failure patients from administrative healthcare data. The experimental results show that the error levels indeed are very close to the provided confidence levels, as guaranteed by the conformal prediction framework, and that the error for predicting each outcome, i.e., event or no-event, can be controlled separately. The latter may, however, lead to less informative predictions, i.e., larger prediction sets, in case the class distribution is heavily imbalanced.

  • 20.
    Boström, Henrik
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Knobbe, ArnoSoares, CarlosPapapetrou, PanagiotisStockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Advances in Intelligent Data Analysis XV: 15th International Symposium, IDA 2016, Stockholm, Sweden, October 13-15, 2016, Proceedings2016Proceedings (redaktörskap) (Refereegranskat)
    Abstract [en]

    This book constitutes the refereed conference proceedings of the 15th International Conference on Intelligent Data Analysis, which was held in October 2016 in Stockholm, Sweden. The 36 revised full papers presented were carefully reviewed and selected from 75 submissions. The traditional focus of the IDA symposium series is on end-to-end intelligent support for data analysis. The symposium aims to provide a forum for inspiring research contributions that might be considered preliminary in other leading conferences and journals, but that have a potentially dramatic impact.

  • 21. Crielaard, Loes
    et al.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Explainable predictions of adverse drug events from electronic health records via oracle coaching2018Ingår i: 2018 IEEE International Conference on Data Mining Workshops (ICDMW): Proceedings, IEEE, 2018, s. 707-714Konferensbidrag (Refereegranskat)
    Abstract [en]

    Information about drug efficacy and safety is limited despite current research on adverse drug events (ADEs). Electronic health records (EHRs) may be an overcoming medium, however the application and evaluation of predictive models for ADE detection based on EHRs focus primarily on predictive performance with little emphasis on explainability and clinical relevance of the obtained predictions. This paper therefore aims to provide new means for obtaining explainable and clinically relevant predictions and medical pathways underlying ADEs, by deriving sets of rules leading to explainable ADE predictions via oracle coaching and indirect rule induction. This is achieved by mapping opaque random forest models to explainable decision trees without compromising predictive performance. The results suggest that the average performance of decision trees with oracle coaching exceeds that of random forests for all considered metrics for the task of ADE detection. Relationships between many patient features present in the rulesets and the ADEs appear to exist, however not conforming to the causal pathways implied by the models - which emphasises the need for explainable predictions.

  • 22.
    Deegalla, Sampath
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap. University of Peradeniya, Sri Lanka.
    Walgama, keerthi
    University of Peradeniya, Sri Lanka.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    KTH Royal Institute of Technology, Sweden.
    Random subspace and random projection nearest neighbor ensembles for high dimensional data2022Ingår i: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 191, artikel-id 116078Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The random subspace and the random projection methods are investigated and compared as techniques for forming ensembles of nearest neighbor classifiers in high dimensional feature spaces. The two methods have been empirically evaluated on three types of high-dimensional datasets: microarrays, chemoinformatics, and images. Experimental results on 34 datasets show that both the random subspace and the random projection method lead to improvements in predictive performance compared to using the standard nearest neighbor classifier, while the best method to use depends on the type of data considered; for the microarray and chemoinformatics datasets, random projection outperforms the random subspace method, while the opposite holds for the image datasets. An analysis using data complexity measures, such as attribute to instance ratio and Fisher’s discriminant ratio, provide some more detailed indications on what relative performance can be expected for specific datasets. The results also indicate that the resulting ensembles may be competitive with state-of-the-art ensemble classifiers; the nearest neighbor ensembles using random projection perform on par with random forests for the microarray and chemoinformatics datasets.

    Ladda ner fulltext (pdf)
    fulltext
  • 23.
    García, Diego
    et al.
    University of Oviedo, Gijón, Spain.
    Pérez, Daniel
    University of León, León, Spain.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Díaz, Ignacio
    University of Oviedo, Gijón, Spain.
    A Cuadrado, Abel
    University of Oviedo, Gijón, Spain.
    Maria Enguita, José
    University of Oviedo, Gijón, Spain.
    González-Marcos, Ana
    University of Oviedo, Gijón, Spain.
    Domínguez, Manuel
    University of León, León, Spain.
    Conditioned Fully Convolutional Denoising Autoencoder for Energy Disaggregation2023Ingår i: Artificial Intelligence Applications and Innovations. AIAI 2023 IFIP WG 12.5 International Workshops: MHDW 2023, 5G-PINE 2023, ΑΙBMG 2023, and VAA-CP-EB 2023, León, Spain, June 14–17, 2023, Proceedings / [ed] Ilias Maglogiannis, Lazaros Iliadis, Antonios Papaleonidas, Ioannis Chochliouros, Springer Nature , 2023, s. 421-433Konferensbidrag (Refereegranskat)
    Abstract [en]

    Energy management increasingly requires tools to support decisions for improving consumption. This is achieved not only obtaining feedback from current systems but also using prior knowledge about human behaviour. The advances of data-driven models provide techniques like Non-Intrusive Load Monitoring (NILM) which are capable of estimating energy demand of appliances from total consumption. In addition, deep learning models have improved accuracy in energy disaggregation using separated networks for each device. However, the complexity can increase in large facilities and feedback may be impaired for a proper interpretation. In this work, a deep neural network based on a Fully Convolutional denoising AutoEncoder is proposed for energy disaggregation that uses a conditioning input to modulate the estimation aimed to one specific appliance. The model performs a complete disaggregation using a network whose modulation to target the estimation can be steered by the user. Experiments are done using data from a hospital facility and evaluating reconstruction errors and computational efficiency. The results show acceptable errors compared to methods that require various networks and a reduction of the complexity and computational costs, which can allow the user to be integrated into the analysis loop.

  • 24.
    Glinos, Myrsini
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Dahlberg, Svante
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Tselas, Nikolaos
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    FindMyDoc: a P2P platform disrupting traditional healthcare models and matching patients to doctors2016Ingår i: Proceedings of the 9th ACM International Conference on Pervasive Technologies Related to Assistive Environments, Association for Computing Machinery (ACM), 2016, artikel-id 53Konferensbidrag (Refereegranskat)
    Abstract [en]

    A variety of eHealth apps exist today ranging from ovulation calculators, such as Glow, to more sophisticated systems for determining the right therapist via semantic analysis, such as Talkspace, or ZocDoc. Despite their promising functionality, existing systems offer limited capabilities in terms of search filters, reviews, and doctor recommendations. In this paper, we propose FindMyDoc, a novel peer-to-peer healthcare platform that goes beyond existing traditional healthcare models. It provides doctor recommendations by allowing proper filtering based on treatment procedures, quality of treatment, and reviews of healthcare providers. In addition, the search results are refined using a recommendation engine that employs user-based collaborative filtering and exploits a set of predefined review options provided by the patients in order to match them with doctors.

  • 25.
    Greenstein, Stanley
    et al.
    Stockholms universitet, Juridiska fakulteten, Juridiska institutionen, Institutet för rättsinformatik (IRI).
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap. Aalto University, Finland.
    Mochaourab, Rami
    RISE Research Institutes of Sweden, Sweden.
    Embedding Human Values into Artificial Intelligence (AI)2022Ingår i: Law, AI and Digitalisation / [ed] Katja De Vries; Mattias Dahlberg, Uppsala: Iustus förlag, 2022, 1:1, s. 91-116Kapitel i bok, del av antologi (Övrigt vetenskapligt)
    Ladda ner fulltext (pdf)
    fulltext
  • 26.
    Guarnizo, Stefany
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap. Karolinska Institute, Sweden.
    Miliou, Ioanna
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Impact of Dimensionality on Nowcasting Seasonal Influenza with Environmental Factors2022Ingår i: Advances in Intelligent Data Analysis XX: 20th International Symposium on Intelligent Data Analysis, IDA 2022, Rennes, France, April 20–22, 2022, Proceedings / [ed] Tassadit Bouadi; Elisa Fromont; Eyke Hüllermeier, Cham: Springer, 2022, s. 128-142Konferensbidrag (Refereegranskat)
    Abstract [en]

    Seasonal influenza is an infectious disease of multi-causal etiology and a major cause of mortality worldwide that has been associated with environmental factors. In the attempt to model and predict future outbreaks of seasonal influenza with multiple environmental factors, we face the challenge of increased dimensionality that makes the models more complex and unstable. In this paper, we propose a nowcasting and forecasting framework that compares the theoretical approaches of Single Environmental Factor and Multiple Environmental Factors. We introduce seven solutions to minimize the weaknesses associated with the increased dimensionality when predicting seasonal influenza activity level using multiple environmental factors as external proxies. Our work provides evidence that using dimensionality reduction techniques as a strategy to combine multiple datasets improves seasonal influenza forecasting without the penalization of increased dimensionality.

  • 27. Henelius, Andreas
    et al.
    Karlsson, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Ukkonen, Antti
    Puolamäki, Kai
    Semigeometric Tiling of Event Sequences2016Ingår i: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part I / [ed] Paolo Frasconi, Niels Landwehr, Giuseppe Manco, Jilles Vreeken, Springer, 2016, s. 329-344Konferensbidrag (Refereegranskat)
    Abstract [en]

    Event sequences are ubiquitous, e.g., in finance, medicine, and social media. Often the same underlying phenomenon, such as television advertisements during Superbowl, is reflected in independent event sequences, like different Twitter users. It is hence of interest to find combinations of temporal segments and subsets of sequences where an event of interest, like a particular hashtag, has an increased occurrence probability. Such patterns allow exploration of the event sequences in terms of their evolving temporal dynamics, and provide more fine-grained insights to the data than what for example straightforward clustering can reveal. We formulate the task of finding such patterns as a novel matrix tiling problem, and propose two algorithms for solving it. Our first algorithm is a greedy set-cover heuristic, while in the second approach we view the problem as time-series segmentation. We apply the algorithms on real and artificial datasets and obtain promising results. The software related to this paper is available at https://github.com/bwrc/semigeom-r.

  • 28. Henelius, Andreas
    et al.
    Puolamaki, Kai
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    A peek into the black box: exploring classifiers by randomization2014Ingår i: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 28, nr 5-6, s. 1503-1529Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Classifiers are often opaque and cannot easily be inspected to gain understanding of which factors are of importance. We propose an efficient iterative algorithm to find the attributes and dependencies used by any classifier when making predictions. The performance and utility of the algorithm is demonstrated on two synthetic and 26 real-world datasets, using 15 commonly used learning algorithms to generate the classifiers. The empirical investigation shows that the novel algorithm is indeed able to find groupings of interacting attributes exploited by the different classifiers. These groupings allow for finding similarities among classifiers for a single dataset as well as for determining the extent to which different classifiers exploit such interactions in general.

  • 29. Henelius, Andreas
    et al.
    Puolamäki, Kai
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Clustering with Confidence: Finding Clusters with Statistical Guarantees2016Ingår i: Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Clustering is a widely used unsupervised learning method for finding structure in the data. However, the resulting clusters are typically presented without any guarantees on their robustness; slightly changing the used data sample or re-running a clustering algorithm involving some stochastic component may lead to completely different clusters. There is, hence, a need for techniques that can quantify the instability of the generated clusters. In this study, we propose a technique for quantifying the instability of a clustering solution and for finding robust clusters, termed core clusters, which correspond to clusters where the co-occurrence probability of each data item within a cluster is at least 1−α  . We demonstrate how solving the core clustering problem is linked to finding the largest maximal cliques in a graph. We show that the method can be used with both clustering and classification algorithms. The proposed method is tested on both simulated and real datasets. The results show that the obtained clusters indeed meet the guarantees on robustness.

  • 30. Henelius, Andreas
    et al.
    Puolamäki, Kai
    Karlsson, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Zhao, Jing
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    GoldenEye++: a Closer Look into the Black Box2015Ingår i: Statistical Learning and Data Sciences: Proceedings / [ed] Alexander Gammerman, Vladimir Vovk, Harris Papadopoulos, Springer, 2015, s. 96-105Konferensbidrag (Refereegranskat)
    Abstract [en]

    Models with high predictive performance are often opaque, i.e., they do not allow for direct interpretation, and are hence of limited value when the goal is to understand the reasoning behind predictions. A recently proposed algorithm, GoldenEye, allows detection of groups of interacting variables exploited by a model. We employed this technique in conjunction with random forests generated from data obtained from electronic patient records for the task of detecting adverse drug events (ADEs). We propose a refined version of the GoldenEye algorithm, called GoldenEye++, utilizing a more sensitive grouping metric. An empirical investigation comparing the two algorithms on 27 datasets related to detecting ADEs shows that the new version of the algorithm in several cases finds groups of medically relevant interacting attributes, corresponding to prescribed drugs, undetected by the previous version. This suggests that the GoldenEye++ algorithm can be a useful tool for finding novel (adverse) drug interactions.

  • 31. Hielscher, Tommy
    et al.
    Völzke, Henry
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Spiliopoulou, Myra
    Discovering, selecting and exploiting feature sequence records of study participants for the classification of epidemiological data on hepatic steatosis2018Ingår i: Proceedings of the 33rd Annual ACM Symposium on Applied Computing, Association for Computing Machinery (ACM), 2018, s. 6-13Konferensbidrag (Refereegranskat)
    Abstract [en]

    In longitudinal epidemiological studies, participants undergo repeated medical examinations and are thus represented by a potentially large number of short examination outcome sequences. Some of those sequences may contain important information in various forms, such as patterns, with respect to the disease under study, while others may be on features of little relevance to the outcome. In this work, we propose a framework for Discovery, Selection and Exploitation (DiSelEx) of longitudinal epidemiological data, aiming to identify informative patterns among these sequences. DiSelEx combines sequence clustering with supervised learning to identify sequence groups that contribute to class separation. Newly derived and old features are evaluated and selected according to their redundancy and informativeness regarding the target variable. The selected feature set is then used to learn a classification model on the study data. We evaluate DiSelEx on cohort participants for the disorder "hepatic steatosis" and report on the impact on predictive performance when using sequential data in comparison to utilizing only the basic classifier.

  • 32. Hoffmann, Mikael
    et al.
    Vander Stichele, Robert
    W. Bates, David
    Björklund, Johanna
    Alexander, Steve
    Andersson, Marine L.
    Auraaen, Ane
    Bennie, Marion
    Dahl, Marja-Liisa
    Eiermann, Birgit
    Hackl, Werner
    Hammar, Tora
    Hjemdahl, Paul
    Koch, Sabine
    Kunnamo, Ilkka
    Le Louët, Herve
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Rägo, Lembit
    Spedding, Michael
    Seidling, Hanna M.
    Demner-Fushman, Dina
    Gustafsson, Lars L.
    Guiding principles for the use of knowledge bases and real-world data in clinical decision support systems: report by an international expert workshop at Karolinska Institutet2020Ingår i: Expert Review of Clinical Pharmacology, ISSN 1751-2433, E-ISSN 1751-2441, Vol. 13, nr 9, s. 925-934Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Introduction: Technical and logical breakthroughs have provided new opportunities in medicine to use knowledge bases and large-scale clinical data (real-world) at point-of-care as part of a learning healthcare system to diminish the knowledge-practice gap.

    Areas covered: The article is based on presentations, discussions and recommendations from an international scientific workshop. Value, research needs and funding avenues of knowledge bases and access to real-world data as well as transparency and incorporation of patient perspectives are discussed.

    Expert opinion: Evidence-based, publicly funded, well-structured and curated knowledge bases are of global importance. They ought to be considered as a public responsibility requiring transparency and handling of conflicts of interest. Information has to be made accessible for clinical decision support systems (CDSS) for healthcare staff and patients. Access to rich and real-world data is essential for a learning health care ecosystem and can be augmented by data on patient-reported outcomes and preferences. This field can progress by the establishment of an international policy group for developing a best practice guideline on the development, maintenance, governance, evaluation principles and financing of open-source knowledge bases and handling of real-world data.

  • 33. Hollmén, Jaakko
    et al.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Karlsson, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Norstedt Wikner, Birgitta
    Öhman, Inger
    Exploring epistaxis as an adverse effect of anti-thrombotic drugs and outdoor temperature2018Ingår i: Proceedings of the 11th PErvasive Technologies Related to Assistive Environments Conference (PETRA), Association for Computing Machinery (ACM), 2018, s. 1-4Konferensbidrag (Refereegranskat)
    Abstract [en]

    Electronic health records contain a wealth of epidemiological information about diseases at the population level. Using a database of medical diagnoses and drug prescriptions in electronic health records, we investigate the correlation between outdoor temperature and the incidence of epistaxis over time for two groups of patients. One group consists of patients that had been diagnosed with epistaxis and also been prescribed at least one of the three anti-thrombotic agents: Warfarin, Apixaban, or Rivaroxaban. The other group consists of patients that had been diagnosed with epistaxis and not been prescribed any of the three anti-thrombotic drugs. We find a strong negative correlation between the incidence of epistaxis and outdoor temperature for the group that had not been prescribed any of the three anti-thrombotic drugs, while there is a weaker correlation between incidence of epistaxis and outdoor temperature for the other group. It is, however, clear that both groups are affected in a similar way, such that the incidence of epistaxis increases with colder temperatures.

  • 34.
    Hollmén, Jaakko
    et al.
    Aalto University, Finland.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Clustering Diagnostic Profiles of Patients2019Ingår i: Artificial Intelligence Applications and Innovations: Proceedings / [ed] John MacIntyre, Ilias Maglogiannis, Lazaros Iliadis, Elias Pimenidis, Springer, 2019, s. 120-126Konferensbidrag (Refereegranskat)
    Abstract [en]

    Electronic Health Records provide a wealth of information about the care of patients and can be used for checking the conformity of planned care, computing statistics of disease prevalence, or predicting diagnoses based on observed symptoms, for instance. In this paper, we explore and analyze the recorded diagnoses of patients in a hospital database in retrospect, in order to derive profiles of diagnoses in the patient database. We develop a data representation compatible with a clustering approach and present our clustering approach to perform the exploration. We use a k-means clustering model for identifying groups in our binary vector representation of diagnoses and present appropriate model selection techniques to select the number of clusters. Furthermore, we discuss possibilities for interpretation in terms of diagnosis probabilities, in the light of external variables and with the common diagnoses occurring together.

  • 35. Hollmén, Jaakko
    et al.
    Papapetrou, PanagiotisStockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Proceedings of the ECMLPKDD 2015 Doctoral Consortium2015Proceedings (redaktörskap) (Övrigt vetenskapligt)
  • 36.
    Homem, Irvin
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Harnessing Predictive Models for Assisting Network Forensic Investigations of DNS Tunnels2017Ingår i: Annual ADFSL Conference on Digital Forensics, Security and Law: Proceedings, 2017, artikel-id 7Konferensbidrag (Refereegranskat)
    Abstract [en]

    In recent times, DNS tunneling techniques have been used for malicious purposes, however network security mechanisms struggle to detect them. Network forensic analysis has been proven effective, but is slow and effort intensive as Network Forensics Analysis Tools struggle to deal with undocumented or new network tunneling techniques. In this paper, we present a machine learning approach, based on feature subsets of network traffic evidence, to aid forensic analysis through automating the inference of protocols carried within DNS tunneling techniques. We explore four network traffic protocols, namely, HTTP, HTTPS, FTP, and POP3. Three features are extracted from the DNS tunneled traffic: IP packet length, DNS Query Name Entropy and DNS Query Name Length. We benchmark the performance of four classification models, i.e., decision trees, support vector machines, k-nearest neighbours, and neural networks, on a data set of DNS tunneled traffic. Classification accuracy of 95% is achieved and the feature set reduces the original evidence data size by a factor of 74%. More importantly, our findings provide strong evidence that predictive modeling machine learning techniques can be used to identify network protocols within DNS tunneled traffic in real-time with high accuracy from a relatively small-sized feature-set, without necessarily infringing on privacy from the outset, nor having to collect complete DNS Tunneling sessions.

  • 37.
    Homem, Irvin
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Dosis, Spyridon
    Information-Entropy-Based DNS Tunnel Prediction2018Ingår i: Advances in Digital Forensics XIV / [ed] Gilbert Peterson, Sujeet Shenoi, Springer, 2018, s. 127-140Konferensbidrag (Refereegranskat)
    Abstract [en]

    DNS tunneling techniques are often used for malicious purposes. Network security mechanisms have struggled to detect DNS tunneling. Network forensic analysis has been proposed as a solution, but it is slow, invasive and tedious as network forensic analysis tools struggle to deal with undocumented and new network tunneling techniques.

    This chapter presents a method for supporting forensic analysis by automating the inference of tunneled protocols. The internal packet structure of DNS tunneling techniques is analyzed and the information entropy of various network protocols and their DNS tunneled equivalents are characterized. This provides the basis for a protocol prediction method that uses entropy distribution averaging. Experiments demonstrate that the method has a prediction accuracy of 75%. The method also preserves privacy because it only computes the information entropy and does not parse the actual tunneled content.

  • 38. Jaber, Mohammad
    et al.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    González-Marcos, Anna
    Wood, Peter T.
    Analysing Online Education-based Asynchronous Communication Tools to Detect Students' Roles2015Ingår i: CSEDU 2015 - Proceedings of the 7th International Conference on Computer Supported Education / [ed] Markus Helfert, Maria Teresa Restivo, Susan Zvacek, James Onohuome Uhomoibhi, SciTePress, 2015, Vol. 2, s. 416-424Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper studies the application of Educational Data Mining to examine the online communication behaviour of students working together on the same project in order to identify the different roles played by the students. Analysis was carried out using real data from students' participation in project communication tools. Several sets of features including individual attributes and information about the interactions between the project members were used to train different classification algorithms. The results show that considering the individual attributes of students provided regular classification performance. The inclusion of information about the reply relationships among the project members generally improved mapping students to their roles. However, it was necessary to add ``time-based'' features in order to achieve the best classification results, which showed both precision and recall of over 95\% for a number of algorithms. Most of these ``time-based'' features coincided with the first weeks of the experience, which indicates the importance of initial interactions between project members.

  • 39. Jaber, Mohammad
    et al.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Helmer, Sven
    Wood, Peter T.
    Using Time-Sensitive Rooted PageRank to Detect Hierarchical Social Relationships2014Ingår i: Advances in Intelligent Data Analysis XIII, Berlin: Springer Berlin/Heidelberg, 2014, s. 143-154Konferensbidrag (Refereegranskat)
    Abstract [en]

    We study the problem of detecting hierarchical ties in a social network by exploiting the interaction patterns between the actors (members) involved in the network. Motivated by earlier work using a rank-based approach, i.e., Rooted-PageRank, we introduce a novel time-sensitive method, called T-RPR, that captures and exploits the dynamics and evolution of the interaction patterns in the network in order to identify the underlying hierarchical ties. Experiments on two real datasets demonstrate the performance of T-RPR in terms of recall and show its superiority over a recent competitor method

  • 40. Jaber, Mohammad
    et al.
    Wood, Peter T.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    González-Marcos, Ana
    A Multi-granularity Pattern-based Sequence Classification Framework for Educational Data2016Ingår i: 3rd IEEE International Conference on Data Science and Advanced Analytics: Proceedings, IEEE Computer Society, 2016, s. 370-378Konferensbidrag (Refereegranskat)
    Abstract [en]

    In many application domains, such as education, sequences of events occurring over time need to be studied in order to understand the generative process behind these sequences, and hence classify new examples. In this paper, we propose a novel multi-granularity sequence classification framework that generates features based on frequent patterns at multiple levels of time granularity. Feature selection techniques are applied to identify the most informative features that are then used to construct the classification model. We show the applicability and suitability of the proposed framework to the area of educational data mining by experimenting on an educational dataset collected from an asynchronous communication tool in which students interact to accomplish an underlying group project. The experimental results showed that our model can achieve competitive performance in detecting the students' roles in their corresponding projects, compared to a baseline similarity-based approach.

  • 41. Jaber, Mohammad
    et al.
    Wood, Peter T.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Helmer, Sven
    Inferring Offline Hierarchical Ties from Online Social Networks2014Ingår i: Proceedings of the companion publication of the 23rd international conference on World wide web companion, Association for Computing Machinery (ACM), 2014, s. 1261-1266Konferensbidrag (Refereegranskat)
    Abstract [en]

    Social networks can represent many different types of relationships between actors, some explicit and some implicit. For example, email communications between users may be represented explicitly in a network, while managerial relationships may not. In this paper we focus on analyzing explicit interactions among actors in order to detect hierarchical social relationships that may be implicit. We start by employing three well-known ranking-based methods, PageRank, Degree Centrality, and Rooted-PageRank (RPR) to infer such implicit relationships from interactions between actors. Then we propose two novel approaches which take into account the time-dimension of interactions in the process of detecting hierarchical ties. We experiment on two datasets, the Enron email dataset to infer manager-subordinate relationships from email exchanges, and a scientific publication co-authorship dataset to detect PhD advisor-advisee relationships from paper co-authorships. Our experiments show that time-based methods perform considerably better than ranking-based methods. In the Enron dataset, they detect 48% of manager-subordinate ties versus 32% found by Rooted-PageRank. Similarly, in co-author dataset, they detect 62% of advisor-advisee ties compared to only 39% by Rooted-PageRank.

  • 42. Jangyodsuk, Pat
    et al.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Athitsos, Vassilis
    Optimizing Hashing Functions for Similarity Indexing in Arbitrary Metric and Nonmetric Spaces2015Ingår i: Proceedings of the SIAM International Conference on Data Mining / [ed] Suresh Venkatasubramanian and Jieping Ye, Society for Industrial and Applied Mathematics, 2015, s. 828-836Konferensbidrag (Refereegranskat)
  • 43.
    Kareem, Hend
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Detecting Hierarchical Ties Using Link-Analysis Ranking at Different Levels of Time Granularity2017Manuskript (preprint) (Övrigt vetenskapligt)
    Abstract [en]

    Social networks contain implicit knowledge that can be used to infer hierarchical relations that are not explicitly present in the available data. Interaction patterns are typically affected by users' social relations. We present an approach to inferring such information that applies a link-analysis ranking algorithm at different levels of time granularity. In addition, a voting scheme is employed for obtaining the hierarchical relations. The approach is evaluated on two datasets: the Enron email data set, where the goal is to infer manager-subordinate relationships, and the Co-author data set, where the goal is to infer PhD advisor-advisee relations. The experimental results indicate that the proposed approach outperforms more traditional approaches to inferring hierarchical relations from social networks.

  • 44.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    KAPMiner: Mining Ordered Association Rules with Constraints2017Ingår i: Advances in Intelligent Data Analysis XVI: Proceedings / [ed] Niall Adams, Allan Tucker, David Weston, 2017, s. 149-161Konferensbidrag (Refereegranskat)
    Abstract [en]

    We study the problem of mining ordered association rules from event sequences. Ordered association rules differ from regular association rules in that the events occurring in the antecedent (left hand side) of the rule are temporally constrained to occur strictly before the events in the consequent (right hand side). We argue that such constraints can provide more meaningful rules in particular application domains, such as health care. The importance and interestingness of the extracted rules are quantified by adapting existing rule mining metrics. Our experimental evaluation on real data sets demonstrates the descriptive power of ordered association rules against ordinary association rules.

  • 45.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Multi-channel ECG classification using forests of randomized shapelet trees2015Ingår i: Proceedings of the 8th ACM International Conference on PErvasive Technologies Related to Assistive Environments, Association for Computing Machinery (ACM), 2015, artikel-id 43Konferensbidrag (Refereegranskat)
    Abstract [en]

    Data series of multiple channels occur at high rates and in massive quantities in several application domains, such as healthcare. In this paper, we study the problem of multi-channel ECG classification. We map this problem to multivariate data series classification and propose five methods for solving it, using a split-and-combine approach. The proposed framework is evaluated using three base-classifiers on real-world data for detecting Myocardial Infarction. Extensive experiments are performed on real ECG data extracted from the Physiobank data repository. Our findings emphasize the importance of selecting an appropriate base-classifier for multivariate data series classification, while demonstrating the superiority of the Random Shapelet Forest (0.825 accuracy) against competitor methods (0.664 accuracy for 1-NN under cDTW).

  • 46.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Persson, Hans E.
    Mining disproportional itemsets for characterizing groups of heart failure patients from administrative health records2017Ingår i: Proceedings of the 10th International Conference on PErvasive Technologies Related to Assistive Environments, Association for Computing Machinery (ACM), 2017, s. 394-398Konferensbidrag (Refereegranskat)
    Abstract [en]

    Heart failure is a serious medical conditions involving decreased quality of life and an increased risk of premature death. A recent evaluation by the Swedish National Board of Health and Welfare shows that Swedish heart failure patients are often undertreated and do not receive basic medication as recommended by the national guidelines for treatment of heart failure. The objective of this paper is to use registry data to characterize groups of heart failure patients, with an emphasis on basic treatment. Towards this end, we explore the applicability of frequent itemset mining and disproportionality analysis for finding interesting and distinctive characterizations of a target group of patients, e.g., those who have received basic treatment, against a control group, e.g., those who have not received basic treatment. Our empirical evaluation is performed on data extracted from administrative health records from the Stockholm County covering the years 2010--2016. Our findings suggest that frequency is not always the most appropriate measure of importance for frequent itemsets, while itemset disproportionality against a control group provides alternative rankings of the extracted itemsets leading to some medically intuitive characterizations of the target groups.

  • 47.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Early Random Shapelet Forest2016Ingår i: Discovery Science: 19th International Conference, DS 2016, Bari, Italy, October 19–21, 2016, Proceedings / [ed] Toon Calders, Michelangelo Ceci, Donato Malerba, Springer, 2016, s. 261-276Konferensbidrag (Refereegranskat)
    Abstract [en]

    Early classification of time series has emerged as an increasingly important and challenging problem within signal processing, especially in domains where timely decisions are critical, such as medical diagnosis in health-care. Shapelets, i.e., discriminative sub-sequences, have been proposed for time series classification as a means to capture local and phase independent information. Recently, forests of randomized shapelet trees have been shown to produce state-of-the-art predictive performance at a low computational cost. In this work, they are extended to allow for early classification of time series. An extensive empirical investigation is presented, showing that the proposed algorithm is superior to alternative state-of-the-art approaches, in case predictive performance is considered to be more important than earliness. The algorithm allows for tuning the trade-off between accuracy and earliness, thereby supporting the generation of early classifiers that can be dynamically adapted to specific needs at low computational cost.

  • 48.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Forests of Randomized Shapelet Trees2015Ingår i: Statistical Learning and Data Sciences: Proceedings / [ed] Alexander Gammerman, Vladimir Vovk, Harris Papadopoulos, Springer, 2015, s. 126-136Konferensbidrag (Refereegranskat)
    Abstract [en]

    Shapelets have recently been proposed for data series classification, due to their ability to capture phase independent and local information. Decision trees based on shapelets have been shown to provide not only interpretable models, but also, in many cases, state-of-the-art predictive performance. Shapelet discovery is however computationally costly, and although several techniques for speeding up the technique have been proposed, the computational cost is still in many cases prohibitive. In this work, an ensemble based method, referred to as Random Shapelet Forest (RSF), is proposed, which builds on the success of the random forest algorithm, and which is shown to have a lower computational complexity than the original shapelet tree learning algorithm. An extensive empirical investigation shows that the algorithm provides competitive predictive performance and that a proposed way of calculating importance scores can be used to successfully identify influential regions.

  • 49.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Generalized random shapelet forests2016Ingår i: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 30, nr 5, s. 1053-1085Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Shapelets are discriminative subsequences of time series, usually embedded in shapelet-based decision trees. The enumeration of time series shapelets is, however, computationally costly, which in addition to the inherent difficulty of the decision tree learning algorithm to effectively handle high-dimensional data, severely limits the applicability of shapelet-based decision tree learning from large (multivariate) time series databases. This paper introduces a novel tree-based ensemble method for univariate and multivariate time series classification using shapelets, called the generalized random shapelet forest algorithm. The algorithm generates a set of shapelet-based decision trees, where both the choice of instances used for building a tree and the choice of shapelets are randomized. For univariate time series, it is demonstrated through an extensive empirical investigation that the proposed algorithm yields predictive performance comparable to the current state-of-the-art and significantly outperforms several alternative algorithms, while being at least an order of magnitude faster. Similarly for multivariate time series, it is shown that the algorithm is significantly less computationally costly and more accurate than the current state-of-the-art.

  • 50.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Rebane, Jonathan
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Gionis, Aristides
    Explainable time series tweaking via irreversible and reversible temporal transformations2018Ingår i: 2018 IEEE International Conference on Data Mining (ICDM): Proceedings, IEEE, 2018, s. 207-216Konferensbidrag (Refereegranskat)
    Abstract [en]

    Time series classification has received great attention over the past decade with a wide range of methods focusing on predictive performance by exploiting various types of temporal features. Nonetheless, little emphasis has been placed on interpretability and explainability. In this paper, we formulate the novel problem of explainable time series tweaking, where, given a time series and an opaque classifier that provides a particular classification decision for the time series, we want to find the minimum number of changes to be performed to the given time series so that the classifier changes its decision to another class. We show that the problem is NP-hard, and focus on two instantiations of the problem, which we refer to as reversible and irreversible time series tweaking. The classifier under investigation is the random shapelet forest classifier. Moreover, we propose two algorithmic solutions for the two problems along with simple optimizations, as well as a baseline solution using the nearest neighbor classifier. An extensive experimental evaluation on a variety of real datasets demonstrates the usefulness and effectiveness of our problem formulation and solutions.

123 1 - 50 av 106
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf