Endre søk
Begrens søket
12 51 - 75 of 75
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 51. Linusson, Henrik
    et al.
    Johansson, Ulf
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Löfström, Tuve
    Reliable Confidence Predictions Using Conformal Prediction2016Inngår i: Advances in Knowledge Discovery and Data Mining: 20th Pacific-Asia Conference, PAKDD 2016, Auckland, New Zealand, April 19-22, 2016, Proceedings, Part I / [ed] James Bailey, Latifur Khan, Takashi Washio, Gill Dobbie, Joshua Zhexue Huang, Ruili Wang, Springer, 2016, 77-88 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Conformal classifiers output confidence prediction regions, i.e., multi-valued predictions that are guaranteed to contain the true output value of each test pattern with some predefined probability. In order to fully utilize the predictions provided by a conformal classifier, it is essential that those predictions are reliable, i.e., that a user is able to assess the quality of the predictions made. Although conformal classifiers are statistically valid by default, the error probability of the prediction regions output are dependent on their size in such a way that smaller, and thus potentially more interesting, predictions are more likely to be incorrect. This paper proposes, and evaluates, a method for producing refined error probability estimates of prediction regions, that takes their size into account. The end result is a binary conformal confidence predictor that is able to provide accurate error probability estimates for those prediction regions containing only a single class label.

  • 52.
    Löfström, Tuve
    et al.
    University of Borås, Sweden.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Linusson, Henrik
    University of Borås, Sweden.
    Johansson, Ulf
    University of Borås, Sweden.
    Bias Reduction through Conditional Conformal Prediction2015Inngår i: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 9, nr 6, 1355-1375 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Conformal prediction (CP) is a relatively new framework in which predictive models output sets of predictions with a bound on the error rate, i.e., the probability of making an erroneous prediction is guaranteed to be equal to or less than a predefined significance level. Label-conditional conformal prediction (LCCP) is a specialization of the framework which gives a bound on the error rate for each individual class. For datasets with class imbalance, many learning algorithms have a tendency to predict the majority class more often than the expected relative frequency, i.e., they are biased in favor of the majority class. In this study, the class bias of standard and label-conditional conformal predictors is investigated. An empirical investigation on 32 publicly available datasets with varying degrees of class imbalance is presented. The experimental results show that CP is highly biased towards the majority class on imbalanced datasets, i.e., it can be expected to make a majority of its errors on the minority class. LCCP, on the other hand, is not biased towards the majority class. Instead, the errors are distributed between the classes almost in accordance with the prior class distribution.

  • 53.
    Löfström, Tuve
    et al.
    University of Borås, Sweden.
    Johansson, Ulf
    University of Borås, Sweden.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Comparing methods for generating diverse ensembles of artificial neural networks2010Inngår i: International Joint Conference on Neural Networks (IJCNN) 2010, 2010, 1-6 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    It is well-known that ensemble performance relies heavily on sufficient diversity among the base classifiers. With this in mind, the strategy used to balance diversity and base classifier accuracy must be considered a key component of any ensemble algorithm. This study evaluates the predictive performance of neural network ensembles, specifically comparing straightforward techniques to more sophisticated. In particular, the sophisticated methods GASEN and NegBagg are compared to more straightforward methods, where each ensemble member is trained independently of the others. In the experimentation, using 31 publicly available data sets, the straightforward methods clearly outperformed the sophisticated methods, thus questioning the use of the more complex algorithms.

  • 54.
    Löfström, Tuve
    et al.
    University of Borås, Sweden.
    Johansson, Ulf
    University of Borås, Sweden.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Effective Utilization of Data in Inductive Conformal Prediction using Ensembles of Neural Networks2013Inngår i: The 2013 International Joint Conference on Neural Networks (IJCNN): Proceedings, IEEE conference proceedings, 2013, 1-8 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Conformal prediction is a new framework producing region predictions with a guaranteed error rate. Inductive conformal prediction (ICP) was designed to significantly reduce the computational cost associated with the original transductive online approach. The drawback of inductive conformal prediction is that it is not possible to use all data for training, since it sets aside some data as a separate calibration set. Recently, cross-conformal prediction (CCP) and bootstrap conformal prediction (BCP) were proposed to overcome that drawback of inductive conformal prediction. Unfortunately, CCP and BCP both need to build several models for the calibration, making them less attractive. In this study, focusing on bagged neural network ensembles as conformal predictors, ICP, CCP and BCP are compared to the very straightforward and cost-effective method of using the out-of-bag estimates for the necessary calibration. Experiments on 34 publicly available data sets conclusively show that the use of out-of-bag estimates produced the most efficient conformal predictors, making it the obvious preferred choice for ensembles in the conformal prediction framework.

  • 55.
    Löfström, Tuve
    et al.
    University of Borås, Sweden.
    Johansson, Ulf
    University of Borås, Sweden.
    Boström, Henrik
    University of Skövde, Sweden.
    Ensemble member selection using multi-objective optimization2009Inngår i: IEEE Symposium on Computational Intelligence and Data Mining, 2009, 245-251 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Both theory and a wealth of empirical studies have established that ensembles are more accurate than single predictive models. Unfortunately, the problem of how to maximize ensemble accuracy is, especially for classification, far from solved. In essence, the key problem is to find a suitable criterion, typically based on training or selection set performance, highly correlated with ensemble accuracy on novel data. Several studies have, however, shown that it is difficult to come up with a single measure, such as ensemble or base classifier selection set accuracy, or some measure based on diversity, that is a good general predictor for ensemble test accuracy. This paper presents a novel technique that for each learning task searches for the most effective combination of given atomic measures, by means of a genetic algorithm. Ensembles built from either neural networks or random forests were empirically evaluated on 30 UCI datasets. The experimental results show that when using the generated combined optimization criteria to rank candidate ensembles, a higher test set accuracy for the top ranked ensemble was achieved, compared to using ensemble accuracy on selection data alone. Furthermore, when creating ensembles from a pool of neural networks, the use of the generated combined criteria was shown to generally outperform the use of estimated ensemble accuracy as the single optimization criterion.

  • 56.
    Löfström, Tuve
    et al.
    University of Borås, Sweden.
    Johansson, Ulf
    University of Borås, Sweden.
    Boström, Henrik
    University of Skövde, Sweden.
    On the Use of Accuracy and Diversity Measures for Evaluating and Selecting Ensembles of Classifiers2008Inngår i: 2008 Seventh International Conference on Machine Learning and Applications, 2008, 127-132 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The test set accuracy for ensembles of classifiers selected based on single measures of accuracy and diversity as well as combinations of such measures is investigated. It is found that by combining measures, a higher test set accuracy may be obtained than by using any single accuracy or diversity measure. It is further investigated whether a multi-criteria search for an ensemble that maximizes both accuracy and diversity leads to more accurate ensembles than by optimizing a single criterion. The results indicate that it might be more beneficial to search for ensembles that are both accurate and diverse. Furthermore, the results show that diversity measures could compete with accuracy measures as selection criterion.

  • 57. Ng, Amos H. C.
    et al.
    Dudas, Catarina
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Deb, Kalyanmoy
    Interleaving innovization with evolutionary multi-objective optimization in production system simulation for faster convergence2013Inngår i: Learning and Intelligent Optimization: 7th International Conference, LION 7, Revised Selected Papers, Springer Berlin/Heidelberg, 2013, 1-18 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper introduces a novel methodology for the optimization, analysis and decision support in production systems engineering. The methodology is based on the innovization procedure, originally introduced to unveil new and innovative design principles in engineering design problems. The innovization procedure stretches beyond an optimization task and attempts to discover new design/operational rules/principles relating to decision variables and objectives, so that a deeper understanding of the underlying problem can be obtained. By integrating the concept of innovization with simulation and data mining techniques, a new set of powerful tools can be developed for general systems analysis. The uniqueness of the approach introduced in this paper lies in that decision rules extracted from the multi-objective optimization using data mining are used to modify the original optimization. Hence, faster convergence to the desired solution of the decision-maker can be achieved. In other words, faster convergence and deeper knowledge of the relationships between the key decision variables and objectives can be obtained by interleaving the multi-objective optimization and data mining process. In this paper, such an interleaved approach is illustrated through a set of experiments carried out on a simulation model developed for a real-world production system analysis problem.

  • 58. Norinder, Ulf
    et al.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Introducing Uncertainty in Predictive Modeling-Friend or Foe?2012Inngår i: Journal of chemical information and modeling, ISSN 1549-9596, Vol. 52, nr 11, 2815-2822 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Uncertainty was introduced to chemical descriptors of 16 publicly available data. sets to various degrees and in various-ways order to investigate the effect on the predictive performance Of the State of-the-art method decision tree ensembles. A number of strategies to handle uncertainty in:decision tree ensembles were evaluated. The main conclusion of the Study. is that uncertainty to a large extent may be introduced in chemical descriptors without. impairing the predictive performance of ensembles and without the predictive performance being significantly reduced from a practical point of view. The investigation. further showed that even When distributions of uncertain values were provided, the ensembles method could generate equally effective models from single-point samples from these distributions. Hence, there seems to be no advantage in using more., elaborate Methods for handling uncertainty in chemical descriptors when using decision tree ensembles as a modeling method for the considered types of introduced uncertainty.

  • 59. Norinder, Ulf
    et al.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Representing descriptors derived from multiple conformations as uncertain features for machine learning2013Inngår i: Journal of Molecular Modeling, ISSN 1610-2940, E-ISSN 0948-5023, Vol. 19, nr 6, 2679-2685 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Uncertainty was introduced into the chemical descriptors of 11 datasets by conformational analysis in order to incorporate three-dimensional information and to investigate the resulting predictive performance of a state-of-the-art machine learning method, random forests, for binary classification tasks. A number of strategies for handling uncertainty in random forests were evaluated. The study showed that when incorporating three-dimensional information as uncertainty into chemical descriptors, the use of uniform probability distributions over the range of possible values, in conjunction with fractional distribution of compounds clearly outperforms the use of normal distributions as well as sampling from both normal and uniform distributions. The main conclusion of this study is that, even when distributions of uncertain values are provided, the random forest method can generate models that are almost as accurate from the expected values of these distributions alone. Hence, there seems to be little advantage to using the more elaborate methods of incorporating uncertainty in chemical descriptors when using random forests rather than replacing the distributions with single-point values. The results also show that random forest models with similar performances can also be generated using three-dimensional descriptor information derived from single (lowest-energy or Corina-derived) conformations.

  • 60. Norinder, Ulf
    et al.
    Lidén, Per
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Discrimination between modes of toxic action of phenols using rule based methods2006Inngår i: Molecular diversity, ISSN 1381-1991, E-ISSN 1573-501X, Vol. 10, nr 2, 207-212 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Rule-based ensemble modelling has been used to develop a model with high accuracy and predictive capabilities for distinguishing between four different modes of toxic action for a set of 220 phenols. The model not only predicts the majority class (polar narcotics) well but also the other three classes (weak acid respiratory uncouplers, pro-electrophiles and soft electrophiles) of toxic action despite the severely skewed distribution among the four investigated classes. Furthermore, the investigation also highlights the merits of using ensemble (or consensus) modelling as an alternative to the more traditional development of a single model in order to promote robustness and accuracy with respect to the predictive capability for the derived model.

  • 61. Sotomane, Constantino
    et al.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Massingue, Venancio
    Factors Affecting the Use of Data Mining in Mozambique2013Inngår i: IST-Africa 2013 Conference Proceedings / [ed] Paul Cunningham, Miriam Cunningham, International Information Management Corporation Limited, 2013Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We present a study aimed at finding important factors that affect the acceptance and use of data mining in Mozambique. Input from otential users has been collected and analysed using a mix of qualitative and quantitative methods. The findings indicate that the level of adoption of data mining in Mozambique is primarily affected by poor quality of data, limited skills and human resources, limited support of stakeholders, organizational issues, limited financial resources and lack of adequate technology. These factors are similar to those identified in other studies.

  • 62.
    Sotomane, Constantino
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap. Ministry Of Science and Technology, Mozambique.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Massingue, Venancion
    Short-term Forecasting of Electricity Consumption in Maputo2013Inngår i: International Conference on Advances in ICT for Emerging Regions (ICTer) - 2013: Conference Proceedings, IEEE Computer Society, 2013, 132-136 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We present a short-term load forecasting model for Maputo. The model is based on the concept of multiple models. A clustering method is combined with expert’s knowledge to identify sub-models. The resulting model, which is the combination of several sub-models, is evaluated and compared to the model currently used by the Electricidade de Moçambique E.P (EDM). The results show that the developed model performs better accuracy than the one currently used by EDM. The results obtained by the application of the model when translated into financial figures demonstrate significant economic advantages. The social and environmental implications of the model are also analysed.

  • 63.
    Sotomane, Constantino
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Gallego-Ayala, Jordi
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Massingue, Venancio
    Extracting Patterns from Socioeconomic Databases to Characterize Small Farmers with High and Low Corn Yields in Mozambique: a Data Mining Approach2012Inngår i: Advances in Data Mining: Workshop Proceedings / [ed] Isabelle Bichindaritz, Petra Perner, Georg Ruß, Rainer Schmidt, Ibai Publishing , 2012, 99-108 s.Konferansepaper (Annet vitenskapelig)
    Abstract [en]

    Mozambique is mainly a rural country. Agriculture is a pillar of the Mozambique economy and is the main source of income for 80% of the population living in rural areas. One of the major problems in the agricultural sector is low productivity, which for most crops is the lowest in Africa. The main food crop cultivated in Mozambique is maize. This research aims to characterize households with high and low maize yields based on the National Agricultural Survey Data from 2007 and 2008 using a data mining approach. To this end, we used: a) decision trees, b) association rules, and c) classification rules. The results show that households with high maize yields are those with the capacity to generate income through the commercialization of their production and agricultural assets. Households with low maize yields are associated with production loss before harvest which results in food insecurity.

  • 64.
    Wettergren, Gunnar
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Hansen, Preben
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Nenzén, Stefan
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Perjons, Erik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Framework for implementation of learning analytics projects in higher education2014Inngår i: DSV writers hut 2014: proceedings, August 21-22, Åkersberga, Sweden, Department of Computer and Systems Sciences, Stockholm University , 2014Konferansepaper (Annet vitenskapelig)
  • 65.
    Zacarias, Orlando P.
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap. Eduardo Mondlane University, Mozambique.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Comparing Support Vector Regression and Random Forests for Predicting Malaria Incidence in Mozambique2013Inngår i: 2013 International Conference on Advances in ICT for Emerging Regions (ICTer), IEEE Computer Society, 2013, 217-221 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Accurate prediction of malaria incidence is essentialfor the management of several activities in the ministry of health in Mozambique. This study investigates the comparison ofsupport vector machines (SVMs) and random forests (RFs) forthis purpose. A dataset with records of malaria cases covering theperiod 1999-2008 was used to evaluate predictive models on thelast year when developed from one up to nine years of historicaldata. Mean squared error (MSE) was used as performancemetric. The scheme for estimating variable importance commonlyemployed for RFs was also adopted for SVMs. SVMs developedfrom two year of historical data obtained the best predictionaccuracy. Hence, if we are interested in predicting the actualnumber of malaria cases the support vector machines modelshould be chosen. In the analysis of variable importance, IndoorResidual Spray (IRS), the districts of Manhiça and Matola andmonth of January turned out to be the most important predictorsin both the SVM and RF models.

  • 66.
    Zacarias, Orlando P.
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap. Eduardo Mondlane University, Mozambique .
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Generalization of Malaria Incidence Prediction Models by Correcting Sample Selection Bias2013Inngår i: Advanced Data Mining and Applications: Proceedings, Part II / [ed] Hiroshi Motoda et al., Springer Berlin/Heidelberg, 2013, 189-200 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Performance measurements obtained from dividing a single sample into training and test sets, e.g. by employing cross-validation, may not give an accurate picture of the performance of any model developed from the sample, on the set of examples to which the model will be applied. Such measurements, which are due to that training and test samples are drawn according to different distributions may hence be misleading. In this study, two support vector machine models for predicting malaria incidence developed from certain regions and time periods in Mozambique are evaluated on data from novel regions and time periods, and the use of selection bias correction is investigated. It is observed that significant reductions in the predicted error can be obtained using the latter approach, strongly suggesting that techniques of this kind should be employed if test data can be expected to be drawn from some other distribution than what is the origin of the training data.

  • 67.
    Zacarias, Orlando P.
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Predicting the Incidence of Malaria Cases in Mozambique Using Regression Trees and Forests2013Inngår i: International Journal of Computer Science and Electronics Engineering (IJCSEE), ISSN 2320-401X, Vol. 1, nr 1, 50-54 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Malaria remains a significant public health concern in Mozambique with disease cases reported in almost every province. This study investigates the prediction models of the number of malaria cases in districts of Maputo province. Used data include administrative districts, malaria cases, indoor residual spray and climatic variables temperature, rainfall and humidity. Regression trees and random forest models were developed using the statistical tool R, and applied to predict the number of malaria cases during one year, based on observations from preceding years. Models were compared with respect to the mean squared error (MSE) and correlation coefficient. Indoor Residual Spray (IRS), month of January, minimal temperature and rainfall variables were found to be the most important factors when predicting the number of malaria cases, with some districts showing high malaria incidence. Additionally, by reducing the time window for what historical data to take into account, predictive performance can be increased substantially.

  • 68.
    Zacarias, Orlando P.
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Strengthening the Health Information System in Mozambique through Malaria Incidence Prediction2013Inngår i: IST-Africa 2013 Conference Proceedings / [ed] Paul Cunningham, Miriam Cunningham, IEEE Computer Society, 2013, 1-7 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Malaria is one of the principal health problems in Mozambique, affecting mostly children. The prediction of accurate future incidence cases is crucial for the implementation of appropriate policies of intervention and disease control in order to strengthen the health system. We propose a model based on support vector machines (SVM) for predicting yearly malaria incidence cases for children 0-4 years of age in the Maputo province, Mozambique. The predictive model is trained on two years of historical malaria data in combination with climatic and malaria control factors. A grid optimization parameter tuning procedure was firstly employed to detect the best parameters and select the kernel. In order to determine the most influential factors, variable importance was calculated through estimating the impact of permuting feature values on the predictive performance. The most important malaria incidence predictors turned out to be temperature variation, followed by Matutuine (district), April (month) and Namaacha (district).

  • 69.
    Zhao, Jing
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Henriksson, Aron
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Detecting Adverse Drug Events with Multiple Representations of Clinical Measurements2014Inngår i: 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM): Proceedings, IEEE Computer Society, 2014, 536-543 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Adverse drug events (ADEs) are grossly under-reported in electronic health records (EHRs). This could be mitigated by methods that are able to detect ADEs in EHRs, thereby allowing for missing ADE-specific diagnosis codes to be identified and added. A crucial aspect of constructing such systems is to find proper representations of the data in order to allow the predictive modeling to be as accurate as possible. One category of EHR data that can be used as indicators of ADEs are clinical measurements. However, using clinical measurements as features is not unproblematic due to the high rate of missing values and they can be repeated a variable number of times in each patient health record. In this study, five basic representations of clinical measurements are proposed and evaluated to handle these two problems. An empirical investigation using random forest on 27 datasets from a real EHR database with different ADE targets is presented, demonstrating that the predictive performance, in terms of accuracy and area under ROC curve, is higher when representing clinical measurements crudely as whether they were taken or how many times they were taken by a patient. Furthermore, a sixth alternative, combining all five basic representations, significantly outperforms using any of the basic representation except for one. A subsequent analysis of variable importance is also conducted with this fused feature set, showing that when clinical measurements have a high missing rate, the number of times they were taken by one patient is ranked as more informative than looking at their actual values. The observation from random forest is also confirmed empirically using other commonly employed classifiers. This study demonstrates that the way in which clinical measurements from EHRs are presented has a high impact for ADE detection, and that using multiple representations outperforms using a basic representation.

  • 70.
    Zhao, Jing
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Henriksson, Aron
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Predictive modeling of structured electronic health records for adverse drug event detection2015Inngår i: BMC Medical Informatics and Decision Making, ISSN 1472-6947, E-ISSN 1472-6947, Vol. 15, nr SIArtikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: The digitization of healthcare data, resulting from the increasingly widespread adoption of electronic health records, has greatly facilitated its analysis by computational methods and thereby enabled large-scale secondary use thereof. This can be exploited to support public health activities such as pharmacovigilance, wherein the safety of drugs is monitored to inform regulatory decisions about sustained use. To that end, electronic health records have emerged as a potentially valuable data source, providing access to longitudinal observations of patient treatment and drug use. A nascent line of research concerns predictive modeling of healthcare data for the automatic detection of adverse drug events, which presents its own set of challenges: it is not yet clear how to represent the heterogeneous data types in a manner conducive to learning high-performing machine learning models. Methods: Datasets from an electronic health record database are used for learning predictive models with the purpose of detecting adverse drug events. The use and representation of two data types, as well as their combination, are studied: clinical codes, describing prescribed drugs and assigned diagnoses, and measurements. Feature selection is conducted on the various types of data to reduce dimensionality and sparsity, while allowing for an in-depth feature analysis of the usefulness of each data type and representation. Results: Within each data type, combining multiple representations yields better predictive performance compared to using any single representation. The use of clinical codes for adverse drug event detection significantly outperforms the use of measurements; however, there is no significant difference over datasets between using only clinical codes and their combination with measurements. For certain adverse drug events, the combination does, however, outperform using only clinical codes. Feature selection leads to increased predictive performance for both data types, in isolation and combined. Conclusions: We have demonstrated how machine learning can be applied to electronic health records for the purpose of detecting adverse drug events and proposed solutions to some of the challenges this presents, including how to represent the various data types. Overall, clinical codes are more useful than measurements and, in specific cases, it is beneficial to combine the two.

  • 71.
    Zhao, Jing
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Henriksson, Aron
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Cascading Adverse Drug Event Detection in Electronic Health Records2015Inngår i: 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA): Proceedings, IEEE Computer Society, 2015Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The ability to detect adverse drug events (ADEs) in electronic health records (EHRs) is useful in many medical applications, such as alerting systems that indicate when an ADE-specific diagnosis code should be assigned. Automating the detection of ADEs can be attempted by applying machine learning to existing, labeled EHR data. How to do this in an effective manner is, however, an open question. The issues addressed in this study concern the granularity of the classification task: (1) If we wish to predict the occurrence of ADE, is it advantageous to conflate the various ADE class labels prior to learning, or should they be merged post prediction? (2) If we wish to predict a family of ADEs or even a specific ADE, can the predictive performance be enhanced by dividing the classification task into a cascading scheme: predicting first, on a coarse level, whether there is an ADE or not, and, in the former case, followed by a more specific prediction on which family the ADE belongs to, and then finally a prediction on the specific ADE within that particular family? In this study, we conduct a series of experiments using a real, clinical dataset comprising healthcare episodes that have been assigned one of eight ADE-related diagnosis codes and a set of randomly extracted episodes that have not been assigned any ADE code. It is shown that, when distinguishing between ADEs and non-ADEs, merging the various ADE labels prior to learning leads to significantly higher predictive performance in terms of accuracy and area under ROC curve. A cascade of random forests is moreover constructed to determine either the family of ADEs or the specific class label; here, the performance is indeed enhanced compared to directly employing a one-step prediction. This study concludes that, if predictive performance is of primary importance, the cascading scheme should be the recommended approach over employing a one-step prediction for detecting ADEs in EHRs.

  • 72.
    Zhao, Jing
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Henriksson, Aron
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Detecting Adverse Drug Events Using Concept Hierarchies of Clinical Codes2014Inngår i: 2014 IEEE International Conference on Healthcare Informatics: Proceedings, IEEE Computer Society, 2014, 285-293 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Electronic health records (EHRs) provide a potentially valuable source of information for pharmacovigilance. However, adverse drug events (ADEs), which can be encoded in EHRs with specific diagnosis codes, are heavily under-reported. To provide more accurate estimates for drug safety surveillance, machine learning systems that are able to detect ADEs could be used to identify and suggest missing ADE-specific diagnosis codes. A fundamental consideration when building such systems is how to represent the EHR data to allow for accurate predictive modeling. In this study, two types of clinical code are used to represent drugs and diagnoses: the Anatomical Therapeutic Chemical Classification System (ATC) and the International Statistical Classification of Diseases and Health Problems (ICD). More specifically, it is investigated whether their hierarchical structure can be exploited to improve predictive performance. The use of random forests with feature sets that include only the original, low-level, codes is compared to using random forests with feature sets that contain all levels in the hierarchies. An empirical investigation using thirty datasets with different ADE targets is presented, demonstrating that the predictive performance, in terms of accuracy and area under ROC curve, can be significantly improved by exploiting codes on all levels in the hierarchies, compared to using only the low-level encoding. A further analysis is presented in which two strategies are employed for adding features level-wise according to the concept hierarchies: top-down, starting with the highest abstraction levels, and bottom-up, starting with the most specific encoding. The main finding from this subsequent analysis is that predictive performance can be kept at a high level even without employing the more specific levels in the concept hierarchies.

  • 73.
    Zhao, Jing
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Henriksson, Aron
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Kvist, Maria
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap. Karolinska Institute, Sweden.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Handling Temporality of Clinical Events for Drug Safety Surveillance2015Inngår i: AMIA Annual Symposium Proceedings, ISSN 1559-4076, Vol. 2015, 1371-1380 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Using longitudinal data in electronic health records (EHRs) for post-marketing adverse drug event (ADE) detection allows for monitoring patients throughout their medical history. Machine learning methods have been shown to be efficient and effective in screening health records and detecting ADEs. How best to exploit historical data, as encoded by clinical events in EHRs is, however, not very well understood. In this study, three strategies for handling temporality of clinical events are proposed and evaluated using an EHR database from Stockholm, Sweden. The random forest learning algorithm is applied to predict fourteen ADEs using clinical events collected from different lengths of patient history. The results show that, in general, including longer patient history leads to improved predictive performance, and that assigning weights to events according to time distance from the ADE yields the biggest improvement.

  • 74.
    Zhao, Jing
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Karlsson, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Applying Methods for Signal Detection in Spontaneous Reports to Electronic Patient Records2013Inngår i: Proceedings of the  19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery (ACM), 2013Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Currently, pharmacovigilance relies mainly on disproportionality analysis of spontaneous reports. However, the analysis of spontaneous reports is concerned with several problems, such as reliability, under-reporting and insucient patient information. Longitudinal healthcare data, such as Electronic Patient Records (EPRs) in which comprehensive information of each patient is covered, is a complementary source of information to detect Adverse Drug Events (ADEs). A wide set of disproportionality methods has been developed for analyzing spontaneous reports to assess the risk of reported events being ADEs. This study aims to investigate the use of such methods for detecting ADEs when analyzing EPRs. The data used in this study was extracted from Stockholm EPR Corpus. Four disproportionality methods (proportional reporting rate, reporting odds ratio, Bayesian condence propagation neural network, and Gamma-Poisson shrinker) were applied in two dierent ways to analyze EPRs: creating pseudo spontaneous reports based on all observed drug-event pairs (event-level analysis) or analyzing distinct patients who experienced a drug-event pair (patient-level analysis). The methods were evaluated in a case study on safety surveillance of Celecoxib. The results showed that, among the top 200 signals, more ADEs were detected by the event-level analysis than by the patient-level analysis. Moreover, the event-level analysis also resulted in a higher mean average precision. The main conclusion of this study is that the way in which the disproportionality analysis is applied, the event-level or patient-level analysis, can have a much higher impact on the performance than which disproportionality method is employed.

  • 75.
    Zhao, Jing
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Learning from heterogeneous temporal data from electronic health records2017Inngår i: Journal of Biomedical Informatics, ISSN 1532-0464, E-ISSN 1532-0480, Vol. 65, 105-119 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Electronic health records contain large amounts of longitudinal data that are valuable for biomedical informatics research. The application of machine learning is a promising alternative to manual analysis of such data. However, the complex structure of the data, which includes clinical events that are unevenly distributed over time, poses a challenge for standard learning algorithms. Some approaches to modeling temporal data rely on extracting single values from time series; however, this leads to the loss of potentially valuable sequential information. How to better account for the temporality of clinical data, hence, remains an important research question. In this study, novel representations of temporal data in electronic health records are explored. These representations retain the sequential information, and are directly compatible with standard machine learning algorithms. The explored methods are based on symbolic sequence representations of time series data, which are utilized in a number of different ways. An empirical investigation, using 19 datasets comprising clinical measurements observed over time from a real database of electronic health records, shows that using a distance measure to random subsequences leads to substantial improvements in predictive performance compared to using the original sequences or clustering the sequences. Evidence is moreover provided on the quality of the symbolic sequence representation by comparing it to sequences that are generated using domain knowledge by clinical experts. The proposed method creates representations that better account for the temporality of clinical events, which is often key to prediction tasks in the biomedical domain.

12 51 - 75 of 75
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf