Endre søk
Begrens søket
20 - 37 of 37
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 20.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Early Random Shapelet Forest2016Inngår i: Discovery Science: 19th International Conference, DS 2016, Bari, Italy, October 19–21, 2016, Proceedings / [ed] Toon Calders, Michelangelo Ceci, Donato Malerba, Springer, 2016, 261-276 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Early classification of time series has emerged as an increasingly important and challenging problem within signal processing, especially in domains where timely decisions are critical, such as medical diagnosis in health-care. Shapelets, i.e., discriminative sub-sequences, have been proposed for time series classification as a means to capture local and phase independent information. Recently, forests of randomized shapelet trees have been shown to produce state-of-the-art predictive performance at a low computational cost. In this work, they are extended to allow for early classification of time series. An extensive empirical investigation is presented, showing that the proposed algorithm is superior to alternative state-of-the-art approaches, in case predictive performance is considered to be more important than earliness. The algorithm allows for tuning the trade-off between accuracy and earliness, thereby supporting the generation of early classifiers that can be dynamically adapted to specific needs at low computational cost.

  • 21.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Forests of Randomized Shapelet Trees2015Inngår i: Statistical Learning and Data Sciences: Proceedings / [ed] Alexander Gammerman, Vladimir Vovk, Harris Papadopoulos, Springer, 2015, 126-136 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Shapelets have recently been proposed for data series classification, due to their ability to capture phase independent and local information. Decision trees based on shapelets have been shown to provide not only interpretable models, but also, in many cases, state-of-the-art predictive performance. Shapelet discovery is however computationally costly, and although several techniques for speeding up the technique have been proposed, the computational cost is still in many cases prohibitive. In this work, an ensemble based method, referred to as Random Shapelet Forest (RSF), is proposed, which builds on the success of the random forest algorithm, and which is shown to have a lower computational complexity than the original shapelet tree learning algorithm. An extensive empirical investigation shows that the algorithm provides competitive predictive performance and that a proposed way of calculating importance scores can be used to successfully identify influential regions.

  • 22.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Generalized random shapelet forests2016Inngår i: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 30, nr 5, 1053-1085 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Shapelets are discriminative subsequences of time series, usually embedded in shapelet-based decision trees. The enumeration of time series shapelets is, however, computationally costly, which in addition to the inherent difficulty of the decision tree learning algorithm to effectively handle high-dimensional data, severely limits the applicability of shapelet-based decision tree learning from large (multivariate) time series databases. This paper introduces a novel tree-based ensemble method for univariate and multivariate time series classification using shapelets, called the generalized random shapelet forest algorithm. The algorithm generates a set of shapelet-based decision trees, where both the choice of instances used for building a tree and the choice of shapelets are randomized. For univariate time series, it is demonstrated through an extensive empirical investigation that the proposed algorithm yields predictive performance comparable to the current state-of-the-art and significantly outperforms several alternative algorithms, while being at least an order of magnitude faster. Similarly for multivariate time series, it is shown that the algorithm is significantly less computationally costly and more accurate than the current state-of-the-art.

  • 23. Kostakis, Orestis
    et al.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Finding the longest common sub-pattern in sequences of temporal intervals2015Inngår i: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 29, nr 5, 1178-1210 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We study the problem of finding the longest common sub-pattern (LCSP) shared by two sequences of temporal intervals. In particular we are interested in finding the LCSP of the corresponding arrangements. Arrangements of temporal intervals are a powerful way to encode multiple concurrent labeled events that have a time duration. Discovering commonalities among such arrangements is useful for a wide range of scientific fields and applications, as it can be seen by the number and diversity of the datasets we use in our experiments. In this paper, we define the problem of LCSP and prove that it is NP-complete by demonstrating a connection between graphs and arrangements of temporal intervals. This connection leads to a series of interesting open problems. In addition, we provide an exact algorithm to solve the LCSP problem, and also propose and experiment with three polynomial time and space under-approximation techniques. Finally, we introduce two upper bounds for LCSP and study their suitability for speeding up 1-NN search. Experiments are performed on seven datasets taken from a wide range of real application domains, plus two synthetic datasets. Lastly, we describe several application cases that demonstrate the need and suitability of LCSP.

  • 24. Kostakis, Orestis
    et al.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    On Searching and Indexing Sequences of Temporal Intervals2017Inngår i: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 31, nr 3, 809-850 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In several application domains, including sign language, sensor networks, and medicine, events are not necessarily instantaneous but they may have a time duration. Such events build sequences of temporal intervals, which may convey useful domain knowledge; thus, searching and indexing these sequences is crucial. We formulate the problem of comparing sequences of labeled temporal intervals and present a distance measure that can be computed in polynomial time. We prove that the distance measure is metric and satisfies the triangle inequality. For speeding up search in large databases of sequences of temporal intervals, we propose an approximate indexing method that is based on embeddings. The proposed indexing framework is shown to be contractive and can guarantee no false dismissal. The distance measure is tested and benchmarked through rigorous experimentation on real data taken from several application domains, including: American Sign Language annotated video recordings, robot sensor data, and Hepatitis patient data. In addition, the indexing scheme is tested on a large synthetic dataset. Our experiments show that speedups of over an order of magnitude can be achieved while maintaining high levels of accuracy. As a result of our work, it becomes possible to implement recommender systems, search engines and assistive applications for the fields that employ sequences of temporal intervals.

  • 25. Kotsifakos, Alexios
    et al.
    Athitsos, Vassilis
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Query-sensitive Distance Measure Selection for Time Series Nearest Neighbor Classification2016Inngår i: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 20, nr 1, 5-27 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Many distance or similarity measures have been proposed for time series similarity search. However, none of these measures is guaranteed to be optimal when used for 1-Nearest Neighbor (NN) classification. In this paper we study the problem of selecting the most appropriate distance measure, given a pool of time series distance measures and a query, so as to perform NN classification of the query. We propose a framework for solving this problem, by identifying, given the query, the distance measure most likely to produce the correct classification result for that query. From this proposed framework, we derive three specific methods, that differ from each other in the way they estimate the probability that a distance measure correctly classifies a query object. In our experiments, our pool of measures consists of Dynamic TimeWarping (DTW), Move-Split-Merge (MSM), and Edit distance with Real Penalty (ERP). Based on experimental evaluation with 45 datasets, the best-performing of the three proposed methods provides the best results in terms of classification error rate, compared to the competitors, which include using the Cross Validation method for selecting the distance measure in each dataset, as well as using a single specific distance measure (DTW, MSM, or ERP) across all datasets.

  • 26. Kotsifakos, Alexios
    et al.
    Karlsson, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Athitsos, Vassilis
    Gunopulos, Dimitrios
    Embedding-based subsequence matching with gaps-range-tolerances: a Query-By-Humming application2015Inngår i: The VLDB journal, ISSN 1066-8888, E-ISSN 0949-877X, Vol. 24, nr 4, 519-536 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We present a subsequence matching framework that allows for gaps in both query and target sequences, employs variable matching tolerance efficiently tuned for each query and target sequence, and constrains the maximum matching range. Using this framework, a dynamic programming method is proposed, called SMBGT, that, given a short query sequence Q and a large database, identifies in quadratic time the subsequence of the database that best matches Q. SMBGT is highly applicable to music retrieval. However, in Query-By-Humming applications, runtime is critical. Hence, we propose a novel embedding-based approach, called ISMBGT, for speeding up search under SMBGT. Using a set of reference sequences, ISMBGT maps both Q and each position of each database sequence into vectors. The database vectors closest to the query vector are identified, and SMBGT is then applied between Q and the subsequences that correspond to those database vectors. The key novelties of ISMBGT are that it does not require training, it is query sensitive, and it exploits the flexibility of SMBGT. We present an extensive experimental evaluation using synthetic and hummed queries on a large music database. Our findings show that ISMBGT can achieve speedups of up to an order of magnitude against brute-force search and over an order of magnitude against cDTW, while maintaining a retrieval accuracy very close to that of brute-force search.

  • 27. Kotsifakos, Alexios
    et al.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Model-Based Time Series Classification2014Inngår i: Advances in Intelligent Data Analysis XIII / [ed] Blockeel, H; VanLeeuwen, M; Vinciotti, V, Springer, 2014, 179-191 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We propose MTSC, a filter-and-refine framework for time series Nearest Neighbor (NN) classification. Training time series belonging to certain classes are first modeled through Hidden Markov Models (HMMs). Given an unlabeled query, and at the filter step, we identify the top K models that have most likely produced the query. At the refine step, a distance measure is applied between the query and all training time series of the top K models. The query is then assigned with the class of the NN. In our experiments, we first evaluated the NN classification error rate of HMMs compared to three state-of-the-art distance measures on 45 time series datasets of the UCR archive, and showed that modeling time series with HMMs achieves lower error rates in 30 datasets and equal error rates in 4. Secondly, we compared MTSC with Cross Validation defined over the three measures on 33 datasets, and we observed that MTSC is at least as good as the competitor method in 23 datasets, while achieving competitive speedups, showing its effectiveness and efficiency.

  • 28. Kotsifakos, Alexios
    et al.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Hollmén, Jaakko
    Gunopulos, Dimitrios
    Athitsos, Vassilis
    Kollios, George
    Hum-a-song: a subsequence matching with gaps-range-tolerances query-by-humming system2012Inngår i: Proceedings of the VLDB Endowment, ISSN 2150-8097, Vol. 5, nr 12, 1930-1933 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We present "Hum-a-song", a system built for music retrieval, and particularly for the Query-By-Humming (QBH) application. According to QBH, the user is able to hum a part of a song that she recalls and would like to learn what this song is, or find other songs similar to it in a large music repository. We present a simple yet efficient approach that maps the problem to time series subsequence matching. The query and the database songs are represented as 2-dimensional time series conveying information about the pitch and the duration of the notes. Then, since the query is a short sequence and we want to find its best match that may start and end anywhere in the database, subsequence matching methods are suitable for this task. In this demo, we present a system that employs and exposes to the user a variety of state-of-the-art dynamic programming methods, including a newly proposed efficient method named SMBGT that is robust to noise and considers all intrinsic problems in QBH; it allows variable tolerance levels when matching elements, where tolerances are defined as functions of the compared sequences, gaps in both the query and target sequences, and bounds the matching length and (optionally) the minimum number of matched elements. Our system is intended to become open source, which is to the best of our knowledge the first non-commercial effort trying to solve QBH with a variety of methods, and that also approaches the problem from the time series perspective.

  • 29. Kotsifakos, Alexios
    et al.
    Stefan, Alexandra
    Athitsos, Vassilis
    Das, Gautam
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    DRESS: dimensionality reduction for efficient sequence search2015Inngår i: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 29, nr 5, 1280-1311 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Similarity search in large sequence databases is a problem ubiquitous in a wide range of application domains, including searching biological sequences. In this paper we focus on protein and DNA data, and we propose a novel approximate method method for speeding up range queries under the edit distance. Our method works in a filter-and-refine manner, and its key novelty is a query-sensitive mapping that transforms the original string space to a new string space of reduced dimensionality. Specifically, it first identifies the most frequent codewords in the query, and then uses these codewords to convert both the query and the database to a more compact representation. This is achieved by replacing every occurrence of each codeword with a new letter and by removing the remaining parts of the strings. Using this new representation, our method identifies a set of candidate matches that are likely to satisfy the range query, and finally refines these candidates in the original space. The main advantage of our method, compared to alternative methods for whole sequence matching under the edit distance, is that it does not require any training to create the mapping, and it can handle large query lengths with negligible losses in accuracy. Our experimental evaluation demonstrates that, for higher range values and large query sizes, our method produces significantly lower costs and runtimes compared to two state-of-the-art competitor methods.

  • 30. Lijffijt, Jefrey
    et al.
    Nevalainen, Terttu
    Saily, Tanja
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap. Aalto University, Finland.
    Puolamaki, Kai
    Mannila, Heikki
    Significance testing of word frequencies in corpora2016Inngår i: Digital Scholarship in the Humanities, ISSN 2055-768X, Vol. 31, nr 2, 374-397 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Finding out whether a word occurs significantly more often in one text or corpus than in another is an important question in analysing corpora. As noted by Kilgarriff (Language is never, ever, ever, random, Corpus Linguistics and Linguistic Theory, 2005; 1(2): 263–76.), the use of the χ2 and log-likelihood ratio tests is problematic in this context, as they are based on the assumption that all samples are statistically independent of each other. However, words within a text are not independent. As pointed out in Kilgarriff (Comparing corpora, International Journal of Corpus Linguistics, 2001; 6(1): 1–37) and Paquot and Bestgen (Distinctive words in academic writing: a comparison of three statistical tests for keyword extraction. In Jucker, A., Schreier, D., and Hundt, M. (eds), Corpora: Pragmatics and Discourse. Amsterdam: Rodopi, 2009, pp. 247–69), it is possible to represent the data differently and employ other tests, such that we assume independence at the level of texts rather than individual words. This allows us to account for the distribution of words within a corpus. In this article we compare the significance estimates of various statistical tests in a controlled resampling experiment and in a practical setting, studying differences between texts produced by male and female fiction writers in the British National Corpus. We find that the choice of the test, and hence data representation, matters. We conclude that significance testing can be used to find consequential differences between corpora, but that assuming independence between all words may lead to overestimating the significance of the observed differences, especially for poorly dispersed words. We recommend the use of the t-test, Wilcoxon rank-sum test, or bootstrap test for comparing word frequencies across corpora.

  • 31. Lijffijt, Jefrey
    et al.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Puolamaki, Kai
    Size matters: choosing the most informative set of window lengths for mining patterns in event sequences2015Inngår i: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 29, nr 6, 1838-1864 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In order to find patterns in data, it is often necessary to aggregate or summarise data at a higher level of granularity. Selecting the appropriate granularity is a challenging task and often no principled solutions exist. This problem is particularly relevant in analysis of data with sequential structure. We consider this problem for a specific type of data, namely event sequences. We introduce the problem of finding the best set of window lengths for analysis of event sequences for algorithms with real-valued output. We present suitable criteria for choosing one or multiple window lengths and show that these naturally translate into a computational optimisation problem. We show that the problem is NP-hard in general, but that it can be approximated efficiently and even analytically in certain cases. We give examples of tasks that demonstrate the applicability of the problem and present extensive experiments on both synthetic data and real data from several domains. We find that the method works well in practice, and that the optimal sets of window lengths themselves can provide new insight into the data.

  • 32. Lijffijt, jefrey
    et al.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Puolamäki, Kai
    A statistical significance testing approach to mining the most informative set of patterns2014Inngår i: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 28, nr 1, 238-263 s.Artikkel i tidsskrift (Fagfellevurdert)
  • 33. Lundgren, Erik
    et al.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Extracting news text from web pages: an application for the visually impaired2015Inngår i: Proceeding PETRA '15 Proceedings of the 8th ACM International Conference on PErvasive Technologies Related to Assistive Environments, New York: ACM Press, 2015, Vol. Art. 68Konferansepaper (Fagfellevurdert)
  • 34.
    Papapetrou, Panagiotis
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Benson, Gary
    Kollios, George
    Mining poly-regions in DNA2012Inngår i: International journal on data mining and bioinformatics, ISSN 1748-5673, Vol. 6, nr 4, 406-428 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We study the problem of mining poly-regions in DNA. A poly-region is defined as a bursty DNA area, i.e., area of elevated frequency of a DNA pattern. We introduce a general formulation that covers a range of meaningful types of poly-regions and develop three efficient detection methods. The first applies recursive segmentation and is entropy-based. The second uses a set of sliding windows that summarize each sequence segment using several statistics. Finally, the third employs a technique based on majority vote. The proposed algorithms are tested on DNA sequences of four different organisms in terms of recall and runtime.

  • 35.
    Papapetrou, Panagiotis
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Roussos, George
    Social Context Discovery from Temporal App Use Patterns2014Inngår i: Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: adjunct Publication, New York: ACM Press, 2014, 397-402 s.Konferansepaper (Fagfellevurdert)
  • 36. Tselas, Nikolaos
    et al.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Benchmarking Dynamic Time Warping on Nearest Neighbor Classification of Electrocardiograms2014Inngår i: Proceedings of the 7th International Conference on PErvasive Technologies Related to Assistive Environments, ACM Press, 2014Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The human cardiovascular system is a complicated structure that has been the focus of research in many different domains, such as medicine, biology, as well as computer science. Due to the complexity of the heart, even nowadays some of the most common disorders are still hard to identify. In this paper, we map each ECG to a time series or set of time series and explore the applicability of two common time series similarity matching methods, namely, DTW and cDTW, to the problem of ECG classification. We benchmark the two methods on four different datasets in terms of accuracy. In addition, we explore their predictive performance when various ECG channels are taken into account. The latter is performed using a dataset taken from Physiobank. Our findings suggest that different ECG channels are more appropriate for different cardiovascular malfunctions.

  • 37.
    Zhao, Jing
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Learning from heterogeneous temporal data from electronic health records2017Inngår i: Journal of Biomedical Informatics, ISSN 1532-0464, E-ISSN 1532-0480, Vol. 65, 105-119 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Electronic health records contain large amounts of longitudinal data that are valuable for biomedical informatics research. The application of machine learning is a promising alternative to manual analysis of such data. However, the complex structure of the data, which includes clinical events that are unevenly distributed over time, poses a challenge for standard learning algorithms. Some approaches to modeling temporal data rely on extracting single values from time series; however, this leads to the loss of potentially valuable sequential information. How to better account for the temporality of clinical data, hence, remains an important research question. In this study, novel representations of temporal data in electronic health records are explored. These representations retain the sequential information, and are directly compatible with standard machine learning algorithms. The explored methods are based on symbolic sequence representations of time series data, which are utilized in a number of different ways. An empirical investigation, using 19 datasets comprising clinical measurements observed over time from a real database of electronic health records, shows that using a distance measure to random subsequences leads to substantial improvements in predictive performance compared to using the original sequences or clustering the sequences. Evidence is moreover provided on the quality of the symbolic sequence representation by comparing it to sequences that are generated using domain knowledge by clinical experts. The proposed method creates representations that better account for the temporality of clinical events, which is often key to prediction tasks in the biomedical domain.

20 - 37 of 37
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf