Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Modeling Electronic Health Records in Ensembles of Semantic Spaces for Adverse Drug Event Detection
Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
2015 (engelsk)Inngår i: 2015 IEEE International Conference on Bioinformatics and Biomedicine: Proceedings / [ed] Jun (Luke) Huan et al., IEEE Computer Society, 2015, 343-350 s.Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Electronic health records (EHRs) are emerging as a potentially valuable source for pharmacovigilance; however, adverse drug events (ADEs), which can be encoded in EHRs by a set of diagnosis codes, are heavily underreported. Alerting systems, able to detect potential ADEs on the basis of patient- specific EHR data, would help to mitigate this problem. To that end, the use of machine learning has proven to be both efficient and effective; however, challenges remain in representing the heterogeneous EHR data, which moreover tends to be high- dimensional and exceedingly sparse, in a manner conducive to learning high-performing predictive models. Prior work has shown that distributional semantics – that is, natural language processing methods that, traditionally, model the meaning of words in semantic (vector) space on the basis of co-occurrence information – can be exploited to create effective representations of sequential EHR data, not only free-text in clinical notes but also various clinical events such as diagnoses, drugs and measurements. When modeling data in semantic space, an im- portant design decision concerns the size of the context window around an object of interest, which governs the scope of co- occurrence information that is taken into account and affects the composition of the resulting semantic space. Here, we report on experiments conducted on 27 clinical datasets, demonstrating that performance can be significantly improved by modeling EHR data in ensembles of semantic spaces, consisting of multiple semantic spaces built with different context window sizes. A follow-up investigation is conducted to study the impact on predictive performance as increasingly more semantic spaces are included in the ensemble, demonstrating that accuracy tends to improve with the number of semantic spaces, albeit not monotonically so. Finally, a number of different strategies for combining the semantic spaces are explored, demonstrating the advantage of early (feature) fusion over late (classifier) fusion. Ensembles of semantic spaces allow multiple views of (sparse) data to be captured (densely) and thereby enable improved performance to be obtained on the task of detecting ADEs in EHRs.

sted, utgiver, år, opplag, sider
IEEE Computer Society, 2015. 343-350 s.
Emneord [en]
distributional semantics, semantic space ensembles, ensemble models, electronic health records, adverse drug events, predictive modeling, information fusion
HSV kategori
Forskningsprogram
data- och systemvetenskap
Identifikatorer
URN: urn:nbn:se:su:diva-122463DOI: 10.1109/BIBM.2015.7359705OAI: oai:DiVA.org:su-122463DiVA: diva2:866461
Konferanse
IEEE BIBM, International Conference on Bioinformatics and Biomedicine, U.S.A, Washington, D.C., 09-12 November 2015
Prosjekter
High-Performance Data Mining for Drug Effect Detection
Forskningsfinansiär
Swedish Foundation for Strategic Research , IIS11-0053
Tilgjengelig fra: 2015-11-02 Laget: 2015-11-02 Sist oppdatert: 2017-01-16bibliografisk kontrollert
Inngår i avhandling
1. Ensembles of Semantic Spaces: On Combining Models of Distributional Semantics with Applications in Healthcare
Åpne denne publikasjonen i ny fane eller vindu >>Ensembles of Semantic Spaces: On Combining Models of Distributional Semantics with Applications in Healthcare
2015 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

Distributional semantics allows models of linguistic meaning to be derived from observations of language use in large amounts of text. By modeling the meaning of words in semantic (vector) space on the basis of co-occurrence information, distributional semantics permits a quantitative interpretation of (relative) word meaning in an unsupervised setting, i.e., human annotations are not required. The ability to obtain inexpensive word representations in this manner helps to alleviate the bottleneck of fully supervised approaches to natural language processing, especially since models of distributional semantics are data-driven and hence agnostic to both language and domain.

All that is required to obtain distributed word representations is a sizeable corpus; however, the composition of the semantic space is not only affected by the underlying data but also by certain model hyperparameters. While these can be optimized for a specific downstream task, there are currently limitations to the extent the many aspects of semantics can be captured in a single model. This dissertation investigates the possibility of capturing multiple aspects of lexical semantics by adopting the ensemble methodology within a distributional semantic framework to create ensembles of semantic spaces. To that end, various strategies for creating the constituent semantic spaces, as well as for combining them, are explored in a number of studies.

The notion of semantic space ensembles is generalizable across languages and domains; however, the use of unsupervised methods is particularly valuable in low-resource settings, in particular when annotated corpora are scarce, as in the domain of Swedish healthcare. The semantic space ensembles are here empirically evaluated for tasks that have promising applications in healthcare. It is shown that semantic space ensembles – created by exploiting various corpora and data types, as well as by adjusting model hyperparameters such as the size of the context window and the strategy for handling word order within the context window – are able to outperform the use of any single constituent model on a range of tasks. The semantic space ensembles are used both directly for k-nearest neighbors retrieval and for semi-supervised machine learning. Applying semantic space ensembles to important medical problems facilitates the secondary use of healthcare data, which, despite its abundance and transformative potential, is grossly underutilized.

sted, utgiver, år, opplag, sider
Stockholm: Department of Computer and Systems Sciences, Stockholm University, 2015. 95 s.
Serie
Report Series / Department of Computer & Systems Sciences, ISSN 1101-8526 ; 15-021
Emneord
natural language processing, machine learning, distributional semantics, ensemble learning, semantic space ensembles, medical informatics, electronic health records
HSV kategori
Forskningsprogram
data- och systemvetenskap
Identifikatorer
urn:nbn:se:su:diva-122465 (URN)978-91-7649-302-1 (ISBN)
Disputas
2015-12-17, Lilla hörsalen, NOD-huset, Borgarfjordsgatan 12, Kista, 13:00 (engelsk)
Opponent
Veileder
Prosjekter
High-Performance Data Mining for Drug Effect Detection
Forskningsfinansiär
Swedish Foundation for Strategic Research , IIS11-0053
Merknad

At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 4 and 5: Unpublished conference papers.

Tilgjengelig fra: 2015-11-25 Laget: 2015-11-02 Sist oppdatert: 2015-11-13bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler

Andre lenker

Forlagets fulltekst

Søk i DiVA

Av forfatter/redaktør
Henriksson, AronZhao, JingBoström, HenrikDalianis, Hercules
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

Altmetric

Totalt: 368 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf