Digitala Vetenskapliga Arkivet

Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Generalized random shapelet forests
Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.ORCID-id: 0000-0002-4632-4815
Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
Rekke forfattare: 32016 (engelsk)Inngår i: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 30, nr 5, s. 1053-1085Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Shapelets are discriminative subsequences of time series, usually embedded in shapelet-based decision trees. The enumeration of time series shapelets is, however, computationally costly, which in addition to the inherent difficulty of the decision tree learning algorithm to effectively handle high-dimensional data, severely limits the applicability of shapelet-based decision tree learning from large (multivariate) time series databases. This paper introduces a novel tree-based ensemble method for univariate and multivariate time series classification using shapelets, called the generalized random shapelet forest algorithm. The algorithm generates a set of shapelet-based decision trees, where both the choice of instances used for building a tree and the choice of shapelets are randomized. For univariate time series, it is demonstrated through an extensive empirical investigation that the proposed algorithm yields predictive performance comparable to the current state-of-the-art and significantly outperforms several alternative algorithms, while being at least an order of magnitude faster. Similarly for multivariate time series, it is shown that the algorithm is significantly less computationally costly and more accurate than the current state-of-the-art.

sted, utgiver, år, opplag, sider
2016. Vol. 30, nr 5, s. 1053-1085
Emneord [en]
Multivariate time series, Time series classification, Time series shapelets, Decision trees, Ensemble methods
HSV kategori
Forskningsprogram
data- och systemvetenskap
Identifikatorer
URN: urn:nbn:se:su:diva-135052DOI: 10.1007/s10618-016-0473-yISI: 000382010500004OAI: oai:DiVA.org:su-135052DiVA, id: diva2:1043749
Konferanse
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery, Riva del Garda, Italy, September 19-23, 2016
Tilgjengelig fra: 2016-10-31 Laget: 2016-10-31 Sist oppdatert: 2022-02-28bibliografisk kontrollert
Inngår i avhandling
1. Order in the random forest
Åpne denne publikasjonen i ny fane eller vindu >>Order in the random forest
2017 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

In many domains, repeated measurements are systematically collected to obtain the characteristics of objects or situations that evolve over time or other logical orderings. Although the classification of such data series shares many similarities with traditional multidimensional classification, inducing accurate machine learning models using traditional algorithms are typically infeasible since the order of the values must be considered.

In this thesis, the challenges related to inducing predictive models from data series using a class of algorithms known as random forests are studied for the purpose of efficiently and effectively classifying (i) univariate, (ii) multivariate and (iii) heterogeneous data series either directly in their sequential form or indirectly as transformed to sparse and high-dimensional representations. In the thesis, methods are developed to address the challenges of (a) handling sparse and high-dimensional data, (b) data series classification and (c) early time series classification using random forests. The proposed algorithms are empirically evaluated in large-scale experiments and practically evaluated in the context of detecting adverse drug events.

In the first part of the thesis, it is demonstrated that minor modifications to the random forest algorithm and the use of a random projection technique can improve the effectiveness of random forests when faced with discrete data series projected to sparse and high-dimensional representations. In the second part of the thesis, an algorithm for inducing random forests directly from univariate, multivariate and heterogeneous data series using phase-independent patterns is introduced and shown to be highly effective in terms of both computational and predictive performance. Then, leveraging the notion of phase-independent patterns, the random forest is extended to allow for early classification of time series and is shown to perform favorably when compared to alternatives. The conclusions of the thesis not only reaffirm the empirical effectiveness of random forests for traditional multidimensional data but also indicate that the random forest framework can, with success, be extended to sequential data representations.

sted, utgiver, år, opplag, sider
Stockholm: Department of Computer and Systems Sciences, Stockholm University, 2017. s. 76
Serie
Report Series / Department of Computer & Systems Sciences, ISSN 1101-8526 ; 17-004
Emneord
Machine learning, random forest, ensemble, time series, data series, sequential data, sparse data, high-dimensional data
HSV kategori
Forskningsprogram
data- och systemvetenskap
Identifikatorer
urn:nbn:se:su:diva-142052 (URN)978-91-7649-827-9 (ISBN)978-91-7649-828-6 (ISBN)
Disputas
2017-06-08, L30, NOD-huset, Borgarfjordsgatan 12, Stockholm, 13:00 (engelsk)
Opponent
Veileder
Forskningsfinansiär
Swedish Foundation for Strategic Research , IIS11-0053
Tilgjengelig fra: 2017-05-16 Laget: 2017-04-24 Sist oppdatert: 2022-02-28bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekst

Søk i DiVA

Av forfatter/redaktør
Karlsson, IsakPapapetrou, PanagiotisBoström, Henrik
Av organisasjonen
I samme tidsskrift
Data mining and knowledge discovery

Søk utenfor DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric

doi
urn-nbn
Totalt: 1733 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf