Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Deep Neural Networks for Inverse De-Identification of Medical Case Narratives in Reports of Suspected Adverse Drug Reactions
KTH, School of Electrical Engineering and Computer Science (EECS).
2018 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Djupa neuronnät för omvänd avidentifiering av medicinska fallbeskrivningar i biverkningsrapporter (Swedish)
Abstract [en]

Medical research requires detailed and accurate information on individual patients. This is especially so in the context of pharmacovigilance which amongst others seeks to identify previously unknown adverse drug reactions. Here, the clinical stories are often the starting point for assessing whether there is a causal relationship between the drug and the suspected adverse reaction. Reliable automatic de-identification of medical case narratives could allow to share this patient data without compromising the patient’s privacy. Current research on de-identification focused on solving the task of labelling the tokens in a narrative with the class of sensitive information they belong to. In this Master’s thesis project, we explore an inverse approach to the task of de-identification. This means that de-identification of medical case narratives is instead understood as identifying tokens which do not need to be removed from the text in order to ensure patient confidentiality. Our results show that this approach can lead to a more reliable method in terms of higher recall. We achieve a recall of sensitive information of 99.1% while the precision is kept above 51% for the 2014-i2b2 benchmark data set. The model was also fine-tuned on case narratives from reports of suspected adverse drug reactions, where a recall of sensitive information of more than 99% was achieved. Although the precision was only at a level of 55%, which is lower than in comparable systems, an expert could still identify information which would be useful for causality assessment in pharmacovigilance in most of the case narratives which were de-identified with our method. In more than 50% of the case narratives no information useful for causality assessment was missing at all.

Abstract [sv]

Tillgång till detaljerade kliniska data är en förutsättning för att bedriva medicinsk forskning och i förlängningen hjälpa patienter. Säker avidentifiering av medicinska fallbeskrivningar kan göra det möjligt att dela sådan information utan att äventyra patienters skydd av personliga data. Tidigare forskning inom området har sökt angripa problemet genom att märka ord i en text med vilken typ av känslig information de förmedlar. I detta examensarbete utforskar vi möjligheten att angripa problemet på omvänt vis genom att identifiera de ord som inte behöver avlägsnas för att säkerställa skydd av känslig patientinformation. Våra resultat visar att detta kan avidentifiera en större andel av den känsliga informationen: 99,1% av all känslig information avidentifieras med vår metod, samtidigt som 51% av alla uteslutna ord verkligen förmedlar känslig information, vilket undersökts för 2014-i2b2 jämförelse datamängden. Algoritmen anpassades även till fallbeskrivningar från biverkningsrapporter, och i detta fall avidentifierades 99,1% av all känslig information medan 55% av alla uteslutna ord förmedlar känslig information. Även om denna senare andel är lägre än för jämförbara system så kunde en expert hitta information som är användbar för kausalitetsvärdering i flertalet av de avidentifierade rapporterna; i mer än hälften av de avidentifierade fallbeskrivningarna saknades ingen information med värde för kausalitetsvärdering.

Place, publisher, year, edition, pages
2018.
Series
TRITA-EECS-EX ; 2018:53
Keywords [en]
De-Identification, Deep Learning, Recurrent Neural Networks, Natural Language Processing, Pharmacovigilance, Medical Language Processing, Privacy Protection, Adverse Drug Reactions
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-223604OAI: oai:DiVA.org:kth-223604DiVA, id: diva2:1185934
External cooperation
Uppsala Monitoring Centre
Supervisors
Examiners
Available from: 2018-03-05 Created: 2018-02-26 Last updated: 2022-06-26Bibliographically approved

Open Access in DiVA

fulltext(766 kB)1430 downloads
File information
File name FULLTEXT01.pdfFile size 766 kBChecksum SHA-512
150862a4c5c59b4c386af1ed241435d1ad3234c94261fc07c321976d10442dead99fa8ceb2132305666c6569bed9214022f8264444d08e691bb843765d445408
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 1433 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 1866 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf