Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Dimensionality Reduction with Random Indexing: An Application on Adverse Drug Event Detection using Electronic Health Records
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
2014 (English)In: IEEE 27th International Symposium on Computer-Based Medical Systems, New York: IEEE Computer Society, 2014, 304-307 p.Conference paper, Published paper (Refereed)
Abstract [en]

Although electronic health records (EHRs) have recently become an important data source for drug safety signals detection, which is usually evaluated in clinical trials, the use of such data is often prohibited by dimensionality and available computer resources. Currently, several methods for reducing dimensionality are developed, used and evaluated within the medical domain. While these methods perform well, the computational cost tends to increase with growing dimensionality. An alternative solution is random indexing, a technique commonly employed in text classification to reduce the dimensionality of large and sparse documents. This study aims to explore how the predictive performance of random forest is affected by dimensionality reduction through random indexing to predict adverse drug reactions (ADEs). Data are extracted from EHRs and the task is to predict whether or not a patient should be assigned an ADE related diagnosis code. Four different dimensionality settings are investigated and their sensitivity, specificity and area under ROC curve are reported for 14 data sets. The results show that for the investigated data sets, the predictive performance is not negatively affected by dimensionality reduction, however, the computational cost is significantly reduced. Therefore, this study concludes that applying random indexing on EHR data reduces the computational cost, while retaining the predictive performance.

Place, publisher, year, edition, pages
New York: IEEE Computer Society, 2014. 304-307 p.
Series
IEEE International Symposium on Computer-Based Medical Systems, ISSN 1063-7125
Keyword [en]
dimensionality reduction, random forest, random indexing, electronic health records, adverse drug events
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
URN: urn:nbn:se:su:diva-110975DOI: 10.1109/CBMS.2014.22ISI: 000345222200060ISBN: 978-1-4799-4435-4 (print)OAI: oai:DiVA.org:su-110975DiVA: diva2:773749
Conference
27th IEEE International Symposium on Computer-Based Medical Systems (CBMS), New York, USA, May 27-29, 2014
Available from: 2014-12-19 Created: 2014-12-19 Last updated: 2017-04-24Bibliographically approved
In thesis
1. Order in the random forest
Open this publication in new window or tab >>Order in the random forest
2017 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

In many domains, repeated measurements are systematically collected to obtain the characteristics of objects or situations that evolve over time or other logical orderings. Although the classification of such data series shares many similarities with traditional multidimensional classification, inducing accurate machine learning models using traditional algorithms are typically infeasible since the order of the values must be considered.

In this thesis, the challenges related to inducing predictive models from data series using a class of algorithms known as random forests are studied for the purpose of efficiently and effectively classifying (i) univariate, (ii) multivariate and (iii) heterogeneous data series either directly in their sequential form or indirectly as transformed to sparse and high-dimensional representations. In the thesis, methods are developed to address the challenges of (a) handling sparse and high-dimensional data, (b) data series classification and (c) early time series classification using random forests. The proposed algorithms are empirically evaluated in large-scale experiments and practically evaluated in the context of detecting adverse drug events.

In the first part of the thesis, it is demonstrated that minor modifications to the random forest algorithm and the use of a random projection technique can improve the effectiveness of random forests when faced with discrete data series projected to sparse and high-dimensional representations. In the second part of the thesis, an algorithm for inducing random forests directly from univariate, multivariate and heterogeneous data series using phase-independent patterns is introduced and shown to be highly effective in terms of both computational and predictive performance. Then, leveraging the notion of phase-independent patterns, the random forest is extended to allow for early classification of time series and is shown to perform favorably when compared to alternatives. The conclusions of the thesis not only reaffirm the empirical effectiveness of random forests for traditional multidimensional data but also indicate that the random forest framework can, with success, be extended to sequential data representations.

Place, publisher, year, edition, pages
Stockholm: Department of Computer and Systems Sciences, Stockholm University, 2017. 76 p.
Series
Report Series / Department of Computer & Systems Sciences, ISSN 1101-8526 ; 17-004
Keyword
Machine learning, random forest, ensemble, time series, data series, sequential data, sparse data, high-dimensional data
National Category
Computer and Information Science
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-142052 (URN)978-91-7649-827-9 (ISBN)978-91-7649-828-6 (ISBN)
Public defence
2017-06-08, L30, NOD-huset, Borgarfjordsgatan 12, Stockholm, 13:00 (English)
Opponent
Supervisors
Funder
Swedish Foundation for Strategic Research , IIS11-0053
Available from: 2017-05-16 Created: 2017-04-24 Last updated: 2017-05-15Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Karlsson, IsakZhao, Jing
By organisation
Department of Computer and Systems Sciences
Information Systems

Search outside of DiVA

GoogleGoogle Scholar

Altmetric score

Total: 136 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf