Change search
Refine search result
1 - 13 of 13
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the 'Create feeds' function.
  • 1.
    Bendtsen, Marcus
    Linköping University, Department of Computer and Information Science, Database and information techniques. Linköping University, Faculty of Science & Engineering.
    Regimes in baseball players' career data2017In: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 31, no 6, 1580-1621 p.Article in journal (Refereed)
    Abstract [en]

    In this paper we investigate how we can use gated Bayesian networks, a type of probabilistic graphical model, to represent regimes in baseball players’ career data. We find that baseball players do indeed go through different regimes throughout their career, where each regime can be associated with a certain level of performance. We show that some of the transitions between regimes happen in conjunction with major events in the players’ career, such as being traded or injured, but that some transitions cannot be explained by such events. The resulting model is a tool for managers and coaches that can be used to identify where transitions have occurred, as well as an online monitoring tool to detect which regime the player currently is in.

  • 2. Corander, Jukka
    et al.
    Ekdahl, Magnus
    Koski, Timo
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    Parallell interacting MCMC for learning of topologies of graphical models2008In: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 17, no 3, 431-456 p.Article in journal (Refereed)
    Abstract [en]

    Automated statistical learning of graphical models from data has attained a considerable degree of interest in the machine learning and related literature. Many authors have discussed and/or demonstrated the need for consistent stochastic search methods that would not be as prone to yield locally optimal model structures as simple greedy methods. However, at the same time most of the stochastic search methods are based on a standard Metropolis-Hastings theory that necessitates the use of relatively simple random proposals and prevents the utilization of intelligent and efficient search operators. Here we derive an algorithm for learning topologies of graphical models from samples of a finite set of discrete variables by utilizing and further enhancing a recently introduced theory for non-reversible parallel interacting Markov chain Monte Carlo-style computation. In particular, we illustrate how the non-reversible approach allows for novel type of creativity in the design of search operators. Also, the parallel aspect of our method illustrates well the advantages of the adaptive nature of search operators to avoid trapping states in the vicinity of locally optimal network topologies.

  • 3.
    Corander, Jukka
    et al.
    Department of Mathematics, Åbo Akademi University, Åbo, Finland.
    Ekdahl, Magnus
    Linköping University, Department of Mathematics, Mathematical Statistics . Linköping University, The Institute of Technology.
    Koski, Timo
    Department of Mathematics, Royal Institute of Technology, Stockholm, Sweden.
    Parallell interacting MCMC for learning of topologies of graphical models2008In: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 17, no 3, 431-456 p.Article in journal (Refereed)
    Abstract [en]

    Automated statistical learning of graphical models from data has attained a considerable degree of interest in the machine learning and related literature. Many authors have discussed and/or demonstrated the need for consistent stochastic search methods that would not be as prone to yield locally optimal model structures as simple greedy methods. However, at the same time most of the stochastic search methods are based on a standard Metropolis–Hastings theory that necessitates the use of relatively simple random proposals and prevents the utilization of intelligent and efficient search operators. Here we derive an algorithm for learning topologies of graphical models from samples of a finite set of discrete variables by utilizing and further enhancing a recently introduced theory for non-reversible parallel interacting Markov chain Monte Carlo-style computation. In particular, we illustrate how the non-reversible approach allows for novel type of creativity in the design of search operators. Also, the parallel aspect of our method illustrates well the advantages of the adaptive nature of search operators to avoid trapping states in the vicinity of locally optimal network topologies.

  • 4. Henelius, Andreas
    et al.
    Puolamaki, Kai
    Boström, Henrik
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Asker, Lars
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Papapetrou, Panagiotis
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    A peek into the black box: exploring classifiers by randomization2014In: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 28, no 5-6, 1503-1529 p.Article in journal (Refereed)
    Abstract [en]

    Classifiers are often opaque and cannot easily be inspected to gain understanding of which factors are of importance. We propose an efficient iterative algorithm to find the attributes and dependencies used by any classifier when making predictions. The performance and utility of the algorithm is demonstrated on two synthetic and 26 real-world datasets, using 15 commonly used learning algorithms to generate the classifiers. The empirical investigation shows that the novel algorithm is indeed able to find groupings of interacting attributes exploited by the different classifiers. These groupings allow for finding similarities among classifiers for a single dataset as well as for determining the extent to which different classifiers exploit such interactions in general.

  • 5.
    Karlsson, Isak
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Papapetrou, Panagiotis
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Boström, Henrik
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Generalized random shapelet forests2016In: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 30, no 5, 1053-1085 p.Article in journal (Refereed)
    Abstract [en]

    Shapelets are discriminative subsequences of time series, usually embedded in shapelet-based decision trees. The enumeration of time series shapelets is, however, computationally costly, which in addition to the inherent difficulty of the decision tree learning algorithm to effectively handle high-dimensional data, severely limits the applicability of shapelet-based decision tree learning from large (multivariate) time series databases. This paper introduces a novel tree-based ensemble method for univariate and multivariate time series classification using shapelets, called the generalized random shapelet forest algorithm. The algorithm generates a set of shapelet-based decision trees, where both the choice of instances used for building a tree and the choice of shapelets are randomized. For univariate time series, it is demonstrated through an extensive empirical investigation that the proposed algorithm yields predictive performance comparable to the current state-of-the-art and significantly outperforms several alternative algorithms, while being at least an order of magnitude faster. Similarly for multivariate time series, it is shown that the algorithm is significantly less computationally costly and more accurate than the current state-of-the-art.

  • 6. Kostakis, Orestis
    et al.
    Papapetrou, Panagiotis
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Finding the longest common sub-pattern in sequences of temporal intervals2015In: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 29, no 5, 1178-1210 p.Article in journal (Refereed)
    Abstract [en]

    We study the problem of finding the longest common sub-pattern (LCSP) shared by two sequences of temporal intervals. In particular we are interested in finding the LCSP of the corresponding arrangements. Arrangements of temporal intervals are a powerful way to encode multiple concurrent labeled events that have a time duration. Discovering commonalities among such arrangements is useful for a wide range of scientific fields and applications, as it can be seen by the number and diversity of the datasets we use in our experiments. In this paper, we define the problem of LCSP and prove that it is NP-complete by demonstrating a connection between graphs and arrangements of temporal intervals. This connection leads to a series of interesting open problems. In addition, we provide an exact algorithm to solve the LCSP problem, and also propose and experiment with three polynomial time and space under-approximation techniques. Finally, we introduce two upper bounds for LCSP and study their suitability for speeding up 1-NN search. Experiments are performed on seven datasets taken from a wide range of real application domains, plus two synthetic datasets. Lastly, we describe several application cases that demonstrate the need and suitability of LCSP.

  • 7. Kostakis, Orestis
    et al.
    Papapetrou, Panagiotis
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    On Searching and Indexing Sequences of Temporal Intervals2017In: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 31, no 3, 809-850 p.Article in journal (Refereed)
    Abstract [en]

    In several application domains, including sign language, sensor networks, and medicine, events are not necessarily instantaneous but they may have a time duration. Such events build sequences of temporal intervals, which may convey useful domain knowledge; thus, searching and indexing these sequences is crucial. We formulate the problem of comparing sequences of labeled temporal intervals and present a distance measure that can be computed in polynomial time. We prove that the distance measure is metric and satisfies the triangle inequality. For speeding up search in large databases of sequences of temporal intervals, we propose an approximate indexing method that is based on embeddings. The proposed indexing framework is shown to be contractive and can guarantee no false dismissal. The distance measure is tested and benchmarked through rigorous experimentation on real data taken from several application domains, including: American Sign Language annotated video recordings, robot sensor data, and Hepatitis patient data. In addition, the indexing scheme is tested on a large synthetic dataset. Our experiments show that speedups of over an order of magnitude can be achieved while maintaining high levels of accuracy. As a result of our work, it becomes possible to implement recommender systems, search engines and assistive applications for the fields that employ sequences of temporal intervals.

  • 8. Kotsifakos, Alexios
    et al.
    Stefan, Alexandra
    Athitsos, Vassilis
    Das, Gautam
    Papapetrou, Panagiotis
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    DRESS: dimensionality reduction for efficient sequence search2015In: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 29, no 5, 1280-1311 p.Article in journal (Refereed)
    Abstract [en]

    Similarity search in large sequence databases is a problem ubiquitous in a wide range of application domains, including searching biological sequences. In this paper we focus on protein and DNA data, and we propose a novel approximate method method for speeding up range queries under the edit distance. Our method works in a filter-and-refine manner, and its key novelty is a query-sensitive mapping that transforms the original string space to a new string space of reduced dimensionality. Specifically, it first identifies the most frequent codewords in the query, and then uses these codewords to convert both the query and the database to a more compact representation. This is achieved by replacing every occurrence of each codeword with a new letter and by removing the remaining parts of the strings. Using this new representation, our method identifies a set of candidate matches that are likely to satisfy the range query, and finally refines these candidates in the original space. The main advantage of our method, compared to alternative methods for whole sequence matching under the edit distance, is that it does not require any training to create the mapping, and it can handle large query lengths with negligible losses in accuracy. Our experimental evaluation demonstrates that, for higher range values and large query sizes, our method produces significantly lower costs and runtimes compared to two state-of-the-art competitor methods.

  • 9. Lijffijt, Jefrey
    et al.
    Papapetrou, Panagiotis
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Puolamaki, Kai
    Size matters: choosing the most informative set of window lengths for mining patterns in event sequences2015In: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 29, no 6, 1838-1864 p.Article in journal (Refereed)
    Abstract [en]

    In order to find patterns in data, it is often necessary to aggregate or summarise data at a higher level of granularity. Selecting the appropriate granularity is a challenging task and often no principled solutions exist. This problem is particularly relevant in analysis of data with sequential structure. We consider this problem for a specific type of data, namely event sequences. We introduce the problem of finding the best set of window lengths for analysis of event sequences for algorithms with real-valued output. We present suitable criteria for choosing one or multiple window lengths and show that these naturally translate into a computational optimisation problem. We show that the problem is NP-hard in general, but that it can be approximated efficiently and even analytically in certain cases. We give examples of tasks that demonstrate the applicability of the problem and present extensive experiments on both synthetic data and real data from several domains. We find that the method works well in practice, and that the optimal sets of window lengths themselves can provide new insight into the data.

  • 10. Lijffijt, jefrey
    et al.
    Papapetrou, Panagiotis
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Puolamäki, Kai
    A statistical significance testing approach to mining the most informative set of patterns2014In: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 28, no 1, 238-263 p.Article in journal (Refereed)
  • 11.
    Norén, G. Niklas
    et al.
    Stockholm University, Faculty of Science, Department of Mathematics.
    Hopstadius, Johan
    Bate, Andrew
    Star, Kristina
    Edwards, I. Ralph
    Temporal pattern discovery in longitudinal electronic patient records2010In: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 20, no 3, 361-387 p.Article in journal (Refereed)
    Abstract [en]

    Large collections of electronic patient records provide a vast but still underutilised source of information on the real world use of medicines. They are maintained primarily for the purpose of patient administration, but contain a broad range of clinical information highly relevant for data analysis. While they are a standard resource for epidemiological confirmatory studies, their use in the context of exploratory data analysis is still limited. In this paper, we present a framework for open-ended pattern discovery in large patient records repositories. At the core is a graphical statistical approach to summarising and visualising the temporal association between the prescription of a drug and the occurrence of a medical event. The graphical overview contrasts the observed and expected number of occurrences of the medical event in different time periods both before and after the prescription of interest. In order to effectively screen for important temporal relationships, we introduce a new measure of temporal association, which contrasts the observed-to-expected ratio in a time period immediately after the prescription to the observed-to-expected ratio in a control period 2 years earlier. An important feature of both the observed-to-expected graph and the measure of temporal association is a statistical shrinkage towards the null hypothesis of no association, which provides protection against highlighting spurious associations. We demonstrate the usefulness of the proposed pattern discovery methodology by a set of examples from a collection of over two million patient records in the United Kingdom. The identified patterns include temporal relationships between drug prescriptions and medical events suggestive of persistent and transient risks of adverse events, possible beneficial effects of drugs, periodic co-occurrence, and systematic tendencies of patients to switch from one medication to another.

  • 12. Pensar, Johan
    et al.
    Nyman, Henrik
    Koski, Timo
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    Corander, Jukka
    Labeled directed acyclic graphs: a generalization of context-specific independence in directed graphical models2015In: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 29, no 2, 503-533 p.Article in journal (Refereed)
    Abstract [en]

    We introduce a novel class of labeled directed acyclic graph (LDAG) models for finite sets of discrete variables. LDAGs generalize earlier proposals for allowing local structures in the conditional probability distribution of a node, such that unrestricted label sets determine which edges can be deleted from the underlying directed acyclic graph (DAG) for a given context. Several properties of these models are derived, including a generalization of the concept of Markov equivalence classes. Efficient Bayesian learning of LDAGs is enabled by introducing an LDAG-based factorization of the Dirichlet prior for the model parameters, such that the marginal likelihood can be calculated analytically. In addition, we develop a novel prior distribution for the model structures that can appropriately penalize a model for its labeling complexity. A non-reversible Markov chain Monte Carlo algorithm combined with a greedy hill climbing approach is used for illustrating the useful properties of LDAG models for both real and synthetic data sets.

  • 13.
    Rögnvaldsson, Thorsteinn
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), CAISR - Center for Applied Intelligent Systems Research.
    Nowaczyk, Sławomir
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), CAISR - Center for Applied Intelligent Systems Research.
    Byttner, Stefan
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), CAISR - Center for Applied Intelligent Systems Research.
    Prytz, Rune
    Volvo Group Trucks Technology, Göteborg, Sweden.
    Svensson, Magnus
    Volvo Group Trucks Technology, Göteborg, Sweden.
    Self-monitoring for maintenance of vehicle fleets2017In: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756XArticle in journal (Refereed)
    Abstract [en]

    An approach for intelligent monitoring of mobile cyberphysical systems is described, based on consensus among distributed self-organised agents. Its usefulness is experimentally demonstrated over a long-time case study in an example domain: a fleet of city buses. The proposed solution combines several techniques, allowing for life-long learning under computational and communication constraints. The presented work is a step towards autonomous knowledge discovery in a domain where data volumes are increasing, the complexity of systems is growing, and dedicating human experts to build fault detection and diagnostic models for all possible faults is not economically viable. The embedded, self-organised agents operate on-board the cyberphysical systems, modelling their states and communicating them wirelessly to a back-office application. Those models are subsequently compared against each other to find systems which deviate from the consensus. In this way the group (e.g. a fleet of vehicles) is used to provide a standard, or to describe normal behaviour, together with its expected variability under particular operating conditions. The intention is to detect faults without the need for human experts to anticipate them beforehand. This can be used to build up a knowledge base that accumulates over the life-time of the systems. The approach is demonstrated using data collected during regular operation of a city bus fleet over the period of almost four years. © 2017 The Author(s)

1 - 13 of 13
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf