Endre søk
Begrens søket
20 - 47 of 47
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 20. Jangyodsuk, Pat
    et al.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Athitsos, Vassilis
    Optimizing Hashing Functions for Similarity Indexing in Arbitrary Metric and Nonmetric Spaces2015Inngår i: Proceedings of the SIAM International Conference on Data Mining / [ed] Suresh Venkatasubramanian and Jieping Ye, Society for Industrial and Applied Mathematics, 2015, s. 828-836Konferansepaper (Fagfellevurdert)
  • 21.
    Kareem, Hend
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Detecting Hierarchical Ties Using Link-Analysis Ranking at Different Levels of Time Granularity2017Inngår i: Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Social networks contain implicit knowledge that can be used to infer hierarchical relations that are not explicitly present in the available data. Interaction patterns are typically affected by users' social relations. We present an approach to inferring such information that applies a link-analysis ranking algorithm at different levels of time granularity. In addition, a voting scheme is employed for obtaining the hierarchical relations. The approach is evaluated on two datasets: the Enron email data set, where the goal is to infer manager-subordinate relationships, and the Co-author data set, where the goal is to infer PhD advisor-advisee relations. The experimental results indicate that the proposed approach outperforms more traditional approaches to inferring hierarchical relations from social networks.

  • 22.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    KAPMiner: Mining Ordered Association Rules with Constraints2017Inngår i: Advances in Intelligent Data Analysis XVI: Proceedings / [ed] Niall Adams, Allan Tucker, David Weston, 2017, s. 149-161Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We study the problem of mining ordered association rules from event sequences. Ordered association rules differ from regular association rules in that the events occurring in the antecedent (left hand side) of the rule are temporally constrained to occur strictly before the events in the consequent (right hand side). We argue that such constraints can provide more meaningful rules in particular application domains, such as health care. The importance and interestingness of the extracted rules are quantified by adapting existing rule mining metrics. Our experimental evaluation on real data sets demonstrates the descriptive power of ordered association rules against ordinary association rules.

  • 23.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Multi-channel ECG classification using forests of randomized shapelet trees2015Inngår i: Proceedings of the 8th ACM International Conference on PErvasive Technologies Related to Assistive Environments, Association for Computing Machinery (ACM), 2015, artikkel-id 43Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Data series of multiple channels occur at high rates and in massive quantities in several application domains, such as healthcare. In this paper, we study the problem of multi-channel ECG classification. We map this problem to multivariate data series classification and propose five methods for solving it, using a split-and-combine approach. The proposed framework is evaluated using three base-classifiers on real-world data for detecting Myocardial Infarction. Extensive experiments are performed on real ECG data extracted from the Physiobank data repository. Our findings emphasize the importance of selecting an appropriate base-classifier for multivariate data series classification, while demonstrating the superiority of the Random Shapelet Forest (0.825 accuracy) against competitor methods (0.664 accuracy for 1-NN under cDTW).

  • 24.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Persson, Hans E.
    Mining disproportional itemsets for characterizing groups of heart failure patients from administrative health records2017Inngår i: Proceedings of the 10th International Conference on PErvasive Technologies Related to Assistive Environments, Association for Computing Machinery (ACM), 2017, s. 394-398Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Heart failure is a serious medical conditions involving decreased quality of life and an increased risk of premature death. A recent evaluation by the Swedish National Board of Health and Welfare shows that Swedish heart failure patients are often undertreated and do not receive basic medication as recommended by the national guidelines for treatment of heart failure. The objective of this paper is to use registry data to characterize groups of heart failure patients, with an emphasis on basic treatment. Towards this end, we explore the applicability of frequent itemset mining and disproportionality analysis for finding interesting and distinctive characterizations of a target group of patients, e.g., those who have received basic treatment, against a control group, e.g., those who have not received basic treatment. Our empirical evaluation is performed on data extracted from administrative health records from the Stockholm County covering the years 2010--2016. Our findings suggest that frequency is not always the most appropriate measure of importance for frequent itemsets, while itemset disproportionality against a control group provides alternative rankings of the extracted itemsets leading to some medically intuitive characterizations of the target groups.

  • 25.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Early Random Shapelet Forest2016Inngår i: Discovery Science: 19th International Conference, DS 2016, Bari, Italy, October 19–21, 2016, Proceedings / [ed] Toon Calders, Michelangelo Ceci, Donato Malerba, Springer, 2016, s. 261-276Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Early classification of time series has emerged as an increasingly important and challenging problem within signal processing, especially in domains where timely decisions are critical, such as medical diagnosis in health-care. Shapelets, i.e., discriminative sub-sequences, have been proposed for time series classification as a means to capture local and phase independent information. Recently, forests of randomized shapelet trees have been shown to produce state-of-the-art predictive performance at a low computational cost. In this work, they are extended to allow for early classification of time series. An extensive empirical investigation is presented, showing that the proposed algorithm is superior to alternative state-of-the-art approaches, in case predictive performance is considered to be more important than earliness. The algorithm allows for tuning the trade-off between accuracy and earliness, thereby supporting the generation of early classifiers that can be dynamically adapted to specific needs at low computational cost.

  • 26.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Forests of Randomized Shapelet Trees2015Inngår i: Statistical Learning and Data Sciences: Proceedings / [ed] Alexander Gammerman, Vladimir Vovk, Harris Papadopoulos, Springer, 2015, s. 126-136Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Shapelets have recently been proposed for data series classification, due to their ability to capture phase independent and local information. Decision trees based on shapelets have been shown to provide not only interpretable models, but also, in many cases, state-of-the-art predictive performance. Shapelet discovery is however computationally costly, and although several techniques for speeding up the technique have been proposed, the computational cost is still in many cases prohibitive. In this work, an ensemble based method, referred to as Random Shapelet Forest (RSF), is proposed, which builds on the success of the random forest algorithm, and which is shown to have a lower computational complexity than the original shapelet tree learning algorithm. An extensive empirical investigation shows that the algorithm provides competitive predictive performance and that a proposed way of calculating importance scores can be used to successfully identify influential regions.

  • 27.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Generalized random shapelet forests2016Inngår i: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 30, nr 5, s. 1053-1085Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Shapelets are discriminative subsequences of time series, usually embedded in shapelet-based decision trees. The enumeration of time series shapelets is, however, computationally costly, which in addition to the inherent difficulty of the decision tree learning algorithm to effectively handle high-dimensional data, severely limits the applicability of shapelet-based decision tree learning from large (multivariate) time series databases. This paper introduces a novel tree-based ensemble method for univariate and multivariate time series classification using shapelets, called the generalized random shapelet forest algorithm. The algorithm generates a set of shapelet-based decision trees, where both the choice of instances used for building a tree and the choice of shapelets are randomized. For univariate time series, it is demonstrated through an extensive empirical investigation that the proposed algorithm yields predictive performance comparable to the current state-of-the-art and significantly outperforms several alternative algorithms, while being at least an order of magnitude faster. Similarly for multivariate time series, it is shown that the algorithm is significantly less computationally costly and more accurate than the current state-of-the-art.

  • 28.
    Karlsson, Isak
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Rebane, Jonathan Toomas
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Gionis, Aristides
    Explainable time series tweaking via irreversibleand reversible temporal transformations2018Konferansepaper (Fagfellevurdert)
  • 29. Kostakis, Orestis
    et al.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    ABIDE: Querying Time-Evolving Sequences of Temporal Intervals2017Inngår i: Advances in Intelligent Data Analysis XVI: Proceedings / [ed] Niall Adams, Allan Tucker, David Weston, Springer, 2017, s. 173-185Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We study the problem of online similarity search in sequences of temporal intervals; given a standing query and a time-evolving sequence of event-intervals, we want to assess the existence of the query in the sequence over time. Since indexing is inapplicable to our problem, the goal is to reduce runtime without sacrificing retrieval accuracy. We present three lower-bounding and two early-abandon methods for speeding up search, while guaranteeing no false dismissals. We present a framework for combining lower bounds with early abandoning, called ABIDE. Empirical evaluation on eight real datasets and two synthetic datasets suggests that ABIDE provides speedups of at least an order of magnitude and up to 6977 times on average, compared to existing approaches and a baseline. We conclude that ABIDE is more powerful than existing methods, while we can attain the same pruning power with less CPU computations.

  • 30. Kostakis, Orestis
    et al.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Finding the longest common sub-pattern in sequences of temporal intervals2015Inngår i: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 29, nr 5, s. 1178-1210Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We study the problem of finding the longest common sub-pattern (LCSP) shared by two sequences of temporal intervals. In particular we are interested in finding the LCSP of the corresponding arrangements. Arrangements of temporal intervals are a powerful way to encode multiple concurrent labeled events that have a time duration. Discovering commonalities among such arrangements is useful for a wide range of scientific fields and applications, as it can be seen by the number and diversity of the datasets we use in our experiments. In this paper, we define the problem of LCSP and prove that it is NP-complete by demonstrating a connection between graphs and arrangements of temporal intervals. This connection leads to a series of interesting open problems. In addition, we provide an exact algorithm to solve the LCSP problem, and also propose and experiment with three polynomial time and space under-approximation techniques. Finally, we introduce two upper bounds for LCSP and study their suitability for speeding up 1-NN search. Experiments are performed on seven datasets taken from a wide range of real application domains, plus two synthetic datasets. Lastly, we describe several application cases that demonstrate the need and suitability of LCSP.

  • 31. Kostakis, Orestis
    et al.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    On Searching and Indexing Sequences of Temporal Intervals2017Inngår i: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 31, nr 3, s. 809-850Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In several application domains, including sign language, sensor networks, and medicine, events are not necessarily instantaneous but they may have a time duration. Such events build sequences of temporal intervals, which may convey useful domain knowledge; thus, searching and indexing these sequences is crucial. We formulate the problem of comparing sequences of labeled temporal intervals and present a distance measure that can be computed in polynomial time. We prove that the distance measure is metric and satisfies the triangle inequality. For speeding up search in large databases of sequences of temporal intervals, we propose an approximate indexing method that is based on embeddings. The proposed indexing framework is shown to be contractive and can guarantee no false dismissal. The distance measure is tested and benchmarked through rigorous experimentation on real data taken from several application domains, including: American Sign Language annotated video recordings, robot sensor data, and Hepatitis patient data. In addition, the indexing scheme is tested on a large synthetic dataset. Our experiments show that speedups of over an order of magnitude can be achieved while maintaining high levels of accuracy. As a result of our work, it becomes possible to implement recommender systems, search engines and assistive applications for the fields that employ sequences of temporal intervals.

  • 32. Kotsifakos, Alexios
    et al.
    Athitsos, Vassilis
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Query-sensitive Distance Measure Selection for Time Series Nearest Neighbor Classification2016Inngår i: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 20, nr 1, s. 5-27Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Many distance or similarity measures have been proposed for time series similarity search. However, none of these measures is guaranteed to be optimal when used for 1-Nearest Neighbor (NN) classification. In this paper we study the problem of selecting the most appropriate distance measure, given a pool of time series distance measures and a query, so as to perform NN classification of the query. We propose a framework for solving this problem, by identifying, given the query, the distance measure most likely to produce the correct classification result for that query. From this proposed framework, we derive three specific methods, that differ from each other in the way they estimate the probability that a distance measure correctly classifies a query object. In our experiments, our pool of measures consists of Dynamic TimeWarping (DTW), Move-Split-Merge (MSM), and Edit distance with Real Penalty (ERP). Based on experimental evaluation with 45 datasets, the best-performing of the three proposed methods provides the best results in terms of classification error rate, compared to the competitors, which include using the Cross Validation method for selecting the distance measure in each dataset, as well as using a single specific distance measure (DTW, MSM, or ERP) across all datasets.

  • 33. Kotsifakos, Alexios
    et al.
    Karlsson, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Athitsos, Vassilis
    Gunopulos, Dimitrios
    Embedding-based subsequence matching with gaps-range-tolerances: a Query-By-Humming application2015Inngår i: The VLDB journal, ISSN 1066-8888, E-ISSN 0949-877X, Vol. 24, nr 4, s. 519-536Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We present a subsequence matching framework that allows for gaps in both query and target sequences, employs variable matching tolerance efficiently tuned for each query and target sequence, and constrains the maximum matching range. Using this framework, a dynamic programming method is proposed, called SMBGT, that, given a short query sequence Q and a large database, identifies in quadratic time the subsequence of the database that best matches Q. SMBGT is highly applicable to music retrieval. However, in Query-By-Humming applications, runtime is critical. Hence, we propose a novel embedding-based approach, called ISMBGT, for speeding up search under SMBGT. Using a set of reference sequences, ISMBGT maps both Q and each position of each database sequence into vectors. The database vectors closest to the query vector are identified, and SMBGT is then applied between Q and the subsequences that correspond to those database vectors. The key novelties of ISMBGT are that it does not require training, it is query sensitive, and it exploits the flexibility of SMBGT. We present an extensive experimental evaluation using synthetic and hummed queries on a large music database. Our findings show that ISMBGT can achieve speedups of up to an order of magnitude against brute-force search and over an order of magnitude against cDTW, while maintaining a retrieval accuracy very close to that of brute-force search.

  • 34. Kotsifakos, Alexios
    et al.
    Kotsifakos, Evangelos E.
    Papapetrou, Panagiotis
    University of London, United Kingdom.
    Athitsos, Vassilis
    Genre classification of symbolic music with SMBGT2013Inngår i: Proceedings of the 6th International Conference on PErvasive Technologies Related to Assistive Environments (PETRA), Association for Computing Machinery (ACM), 2013, artikkel-id 44Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Automatic music genre classification is a task that has attracted the interest of the music community for more than two decades. Music can be of high importance within the area of assistive technologies as it can be seen as an assistive technology with high therapeutic and educational functionality for children and adults with disabilities. Several similarity methods and machine learning techniques have been applied in the literature to deal with music genre classification, and as a result data mining and Music Information Retrieval (MIR) are strongly interconnected. In this paper, we deal with music genre classification for symbolic music, and specifically MIDI, by combining the recently proposed novel similarity measure for sequences, SMBGT, with the k-Nearest Neighbor (k-NN) classifier. For all MIDI songs we first extract all of their channels and then transform each channel into a sequence of 2D points, providing information for pitch and duration of their music notes. The similarity between two songs is found by computing the SMBGT for all pairs of the songs' channels and getting the maximum pairwise channel score as their similarity. Each song is treated as a query to which k-NN is applied, and the returned genre of the classifier is the one with the majority of votes in the k neighbors. Classification accuracy results indicate that there is room for improvement, especially due to the ambiguous definitions of music genres that make it hard to clearly discriminate them. Using this framework can also help us analyze and understand potential disadvantages of SMBGT, and thus identify how it can be improved when used for classification of real-time sequences.

  • 35. Kotsifakos, Alexios
    et al.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Model-Based Time Series Classification2014Inngår i: Advances in Intelligent Data Analysis XIII / [ed] Blockeel, H; VanLeeuwen, M; Vinciotti, V, Springer, 2014, s. 179-191Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We propose MTSC, a filter-and-refine framework for time series Nearest Neighbor (NN) classification. Training time series belonging to certain classes are first modeled through Hidden Markov Models (HMMs). Given an unlabeled query, and at the filter step, we identify the top K models that have most likely produced the query. At the refine step, a distance measure is applied between the query and all training time series of the top K models. The query is then assigned with the class of the NN. In our experiments, we first evaluated the NN classification error rate of HMMs compared to three state-of-the-art distance measures on 45 time series datasets of the UCR archive, and showed that modeling time series with HMMs achieves lower error rates in 30 datasets and equal error rates in 4. Secondly, we compared MTSC with Cross Validation defined over the three measures on 33 datasets, and we observed that MTSC is at least as good as the competitor method in 23 datasets, while achieving competitive speedups, showing its effectiveness and efficiency.

  • 36. Kotsifakos, Alexios
    et al.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Hollmén, Jaakko
    Gunopulos, Dimitrios
    Athitsos, Vassilis
    Kollios, George
    Hum-a-song: a subsequence matching with gaps-range-tolerances query-by-humming system2012Inngår i: Proceedings of the VLDB Endowment, ISSN 2150-8097, Vol. 5, nr 12, s. 1930-1933Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We present "Hum-a-song", a system built for music retrieval, and particularly for the Query-By-Humming (QBH) application. According to QBH, the user is able to hum a part of a song that she recalls and would like to learn what this song is, or find other songs similar to it in a large music repository. We present a simple yet efficient approach that maps the problem to time series subsequence matching. The query and the database songs are represented as 2-dimensional time series conveying information about the pitch and the duration of the notes. Then, since the query is a short sequence and we want to find its best match that may start and end anywhere in the database, subsequence matching methods are suitable for this task. In this demo, we present a system that employs and exposes to the user a variety of state-of-the-art dynamic programming methods, including a newly proposed efficient method named SMBGT that is robust to noise and considers all intrinsic problems in QBH; it allows variable tolerance levels when matching elements, where tolerances are defined as functions of the compared sequences, gaps in both the query and target sequences, and bounds the matching length and (optionally) the minimum number of matched elements. Our system is intended to become open source, which is to the best of our knowledge the first non-commercial effort trying to solve QBH with a variety of methods, and that also approaches the problem from the time series perspective.

  • 37. Kotsifakos, Alexios
    et al.
    Stefan, Alexandra
    Athitsos, Vassilis
    Das, Gautam
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    DRESS: dimensionality reduction for efficient sequence search2015Inngår i: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 29, nr 5, s. 1280-1311Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Similarity search in large sequence databases is a problem ubiquitous in a wide range of application domains, including searching biological sequences. In this paper we focus on protein and DNA data, and we propose a novel approximate method method for speeding up range queries under the edit distance. Our method works in a filter-and-refine manner, and its key novelty is a query-sensitive mapping that transforms the original string space to a new string space of reduced dimensionality. Specifically, it first identifies the most frequent codewords in the query, and then uses these codewords to convert both the query and the database to a more compact representation. This is achieved by replacing every occurrence of each codeword with a new letter and by removing the remaining parts of the strings. Using this new representation, our method identifies a set of candidate matches that are likely to satisfy the range query, and finally refines these candidates in the original space. The main advantage of our method, compared to alternative methods for whole sequence matching under the edit distance, is that it does not require any training to create the mapping, and it can handle large query lengths with negligible losses in accuracy. Our experimental evaluation demonstrates that, for higher range values and large query sizes, our method produces significantly lower costs and runtimes compared to two state-of-the-art competitor methods.

  • 38. Lijffijt, Jefrey
    et al.
    Nevalainen, Terttu
    Saily, Tanja
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap. Aalto University, Finland.
    Puolamaki, Kai
    Mannila, Heikki
    Significance testing of word frequencies in corpora2016Inngår i: Digital Scholarship in the Humanities, ISSN 2055-768X, Vol. 31, nr 2, s. 374-397Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Finding out whether a word occurs significantly more often in one text or corpus than in another is an important question in analysing corpora. As noted by Kilgarriff (Language is never, ever, ever, random, Corpus Linguistics and Linguistic Theory, 2005; 1(2): 263–76.), the use of the χ2 and log-likelihood ratio tests is problematic in this context, as they are based on the assumption that all samples are statistically independent of each other. However, words within a text are not independent. As pointed out in Kilgarriff (Comparing corpora, International Journal of Corpus Linguistics, 2001; 6(1): 1–37) and Paquot and Bestgen (Distinctive words in academic writing: a comparison of three statistical tests for keyword extraction. In Jucker, A., Schreier, D., and Hundt, M. (eds), Corpora: Pragmatics and Discourse. Amsterdam: Rodopi, 2009, pp. 247–69), it is possible to represent the data differently and employ other tests, such that we assume independence at the level of texts rather than individual words. This allows us to account for the distribution of words within a corpus. In this article we compare the significance estimates of various statistical tests in a controlled resampling experiment and in a practical setting, studying differences between texts produced by male and female fiction writers in the British National Corpus. We find that the choice of the test, and hence data representation, matters. We conclude that significance testing can be used to find consequential differences between corpora, but that assuming independence between all words may lead to overestimating the significance of the observed differences, especially for poorly dispersed words. We recommend the use of the t-test, Wilcoxon rank-sum test, or bootstrap test for comparing word frequencies across corpora.

  • 39. Lijffijt, Jefrey
    et al.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Puolamaki, Kai
    Size matters: choosing the most informative set of window lengths for mining patterns in event sequences2015Inngår i: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 29, nr 6, s. 1838-1864Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In order to find patterns in data, it is often necessary to aggregate or summarise data at a higher level of granularity. Selecting the appropriate granularity is a challenging task and often no principled solutions exist. This problem is particularly relevant in analysis of data with sequential structure. We consider this problem for a specific type of data, namely event sequences. We introduce the problem of finding the best set of window lengths for analysis of event sequences for algorithms with real-valued output. We present suitable criteria for choosing one or multiple window lengths and show that these naturally translate into a computational optimisation problem. We show that the problem is NP-hard in general, but that it can be approximated efficiently and even analytically in certain cases. We give examples of tasks that demonstrate the applicability of the problem and present extensive experiments on both synthetic data and real data from several domains. We find that the method works well in practice, and that the optimal sets of window lengths themselves can provide new insight into the data.

  • 40. Lijffijt, jefrey
    et al.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Puolamäki, Kai
    A statistical significance testing approach to mining the most informative set of patterns2014Inngår i: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 28, nr 1, s. 238-263Artikkel i tidsskrift (Fagfellevurdert)
  • 41. Lundgren, Erik
    et al.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Extracting news text from web pages: an application for the visually impaired2015Inngår i: Proceeding PETRA '15 Proceedings of the 8th ACM International Conference on PErvasive Technologies Related to Assistive Environments, New York: ACM Press, 2015, Vol. Art. 68Konferansepaper (Fagfellevurdert)
  • 42.
    Papapetrou, Panagiotis
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Benson, Gary
    Kollios, George
    Mining poly-regions in DNA2012Inngår i: International journal on data mining and bioinformatics, ISSN 1748-5673, Vol. 6, nr 4, s. 406-428Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We study the problem of mining poly-regions in DNA. A poly-region is defined as a bursty DNA area, i.e., area of elevated frequency of a DNA pattern. We introduce a general formulation that covers a range of meaningful types of poly-regions and develop three efficient detection methods. The first applies recursive segmentation and is entropy-based. The second uses a set of sliding windows that summarize each sequence segment using several statistics. Finally, the third employs a technique based on majority vote. The proposed algorithms are tested on DNA sequences of four different organisms in terms of recall and runtime.

  • 43.
    Papapetrou, Panagiotis
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Roussos, George
    Social Context Discovery from Temporal App Use Patterns2014Inngår i: Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: adjunct Publication, New York: ACM Press, 2014, s. 397-402Konferansepaper (Fagfellevurdert)
  • 44.
    Rebane, Jonathan
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Karlsson, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Learning from Administrative Health Registries2017Inngår i: SoGood 2017: Data Science for Social Good: Proceedings / [ed] Ricard Gavaldà, Irena Koprinska, Stefan Kramer, CEUR-WS.org , 2017Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Over the last decades the healthcare domain has seen a tremendous increase and interest in methods for making inference about patient care using large quantities of medical data. Such data is often stored in electronic health records and administrative health registries. As these data sources have grown increasingly complex, with millions of patients represented by thousands of attributes, static or time evolving, finding relevant and accurate patterns that can be used for predictive or descriptive modelling is impractical for human experts. In this paper, we concentrate our review on Swedish Administrative Health Registries (AHRs) and Electronic Health Records (EHRs) and provide an overview of recent and ongoing work in the area with focus on adverse drug events (ADEs) and heart failure.

  • 45.
    Rebane, Jonathan Toomas
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Karlsson, Isak
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Denic, Stojan
    Seq2Seq RNNs and ARIMA models for Cryptocurrency Prediction: A Comparative Study2018Inngår i: Proceedings of SIGKDD Workshop on Fintech (SIGKDD Fintech’18), Association for Computing Machinery (ACM), 2018, artikkel-id 4Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Cyrptocurrency price prediction has recently become an alluring topic, attracting massive media and investor interest. Traditional models, such as Autoregressive Integrated Moving Average models (ARIMA) and models with more modern popularity, such as Recurrent Neural Networks (RNN’s) can be considered candidates for such financial prediction problems, with RNN’s being capable of utilizing various endogenous and exogenous input sources. This study compares the model performance of ARIMA to that of a seq2seq recurrent deep multi-layer neural network (seq2seq) utilizing a varied selection of inputs types. The results demonstrate superior performance of seq2seq over ARIMA, for models generated throughout most of bitcoin price history, with additional data sources leading to better performance during less volatile price periods.

  • 46. Tselas, Nikolaos
    et al.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Benchmarking Dynamic Time Warping on Nearest Neighbor Classification of Electrocardiograms2014Inngår i: Proceedings of the 7th International Conference on PErvasive Technologies Related to Assistive Environments, ACM Press, 2014Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The human cardiovascular system is a complicated structure that has been the focus of research in many different domains, such as medicine, biology, as well as computer science. Due to the complexity of the heart, even nowadays some of the most common disorders are still hard to identify. In this paper, we map each ECG to a time series or set of time series and explore the applicability of two common time series similarity matching methods, namely, DTW and cDTW, to the problem of ECG classification. We benchmark the two methods on four different datasets in terms of accuracy. In addition, we explore their predictive performance when various ECG channels are taken into account. The latter is performed using a dataset taken from Physiobank. Our findings suggest that different ECG channels are more appropriate for different cardiovascular malfunctions.

  • 47.
    Zhao, Jing
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Asker, Lars
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Learning from heterogeneous temporal data from electronic health records2017Inngår i: Journal of Biomedical Informatics, ISSN 1532-0464, E-ISSN 1532-0480, Vol. 65, s. 105-119Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Electronic health records contain large amounts of longitudinal data that are valuable for biomedical informatics research. The application of machine learning is a promising alternative to manual analysis of such data. However, the complex structure of the data, which includes clinical events that are unevenly distributed over time, poses a challenge for standard learning algorithms. Some approaches to modeling temporal data rely on extracting single values from time series; however, this leads to the loss of potentially valuable sequential information. How to better account for the temporality of clinical data, hence, remains an important research question. In this study, novel representations of temporal data in electronic health records are explored. These representations retain the sequential information, and are directly compatible with standard machine learning algorithms. The explored methods are based on symbolic sequence representations of time series data, which are utilized in a number of different ways. An empirical investigation, using 19 datasets comprising clinical measurements observed over time from a real database of electronic health records, shows that using a distance measure to random subsequences leads to substantial improvements in predictive performance compared to using the original sequences or clustering the sequences. Evidence is moreover provided on the quality of the symbolic sequence representation by comparing it to sequences that are generated using domain knowledge by clinical experts. The proposed method creates representations that better account for the temporality of clinical events, which is often key to prediction tasks in the biomedical domain.

20 - 47 of 47
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf