Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Privacy and explainability in Healthcare AI: Synthetic data generation from Swedish patient records
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
2024 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

The digitilization of the health-care sector has led to an exponential increase in the electronic health record (EHR) data. However, leveraging EHR data for training artificial intelligence models while preserving patient privacy pos- sesses significant challenge. De-identification and synthetic data generation are some strategies employed to mitigate the privacy risk. Despite advancements in synthetic data generation, a significant gap exists in understanding the ex- plainability of AI models trained on synthetic data. Ensuring patient privacy, transparency and interpretability of the AI models, in high critical medical de- cision making process, are paramount.

This thesis address the research question ”How can synthetic tabular data be generated from Swedish patient’s electronic health records, preserving privacy and ensuring transparecy and exlpainability of the AI model ?”

Applying design science research framework, the research generates a synthetic Adverse Drug Event (ADE) datasets from the Swedish patients EHR data using CTGAN. The synthetic data is evaluated for privacy preservation and utility using SynthEval. During privacy evaluation, one of the synthetic datasets ex- hibited an epsilon identifiability risk of 0.16. Random forest classifiers were trained on the synthetic and original datasets and performance estimates were generated for comparative analysis. The classifiers trained on the synthetic data exhibited tremendous performance. SHAP explanations were generated by XAI models trained on synthetic and original data. A comparative analysis of these SHAP explanations demonstrated consistent similarity. The similarity of the SHAP explanations were quantified using gower distance.

The research highlights the efficacy of CTGAN to generate tabular synthetic data from Swedish patient’s EHR data, preserving patient privacy and ensuring transparency and explainability of AI models. Although this research focused on the ADE dataset for L270 due to limitation of time and resource constraints, future investigations could extend to other adverse drug events. Another ap- proach for further studies involves generating synthetic data for positive and negative classes separately and utilizing other XAI methods for generating ex- planations.

Place, publisher, year, edition, pages
2024.
Keywords [en]
Synthetic data, EHR, CTGAN, ADE, XAI, SHAP
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:su:diva-242795OAI: oai:DiVA.org:su-242795DiVA, id: diva2:1955728
Available from: 2025-04-30 Created: 2025-04-30

Open Access in DiVA

fulltext(1072 kB)15 downloads
File information
File name FULLTEXT01.pdfFile size 1072 kBChecksum SHA-512
a856f3cab0782cd962a4d3e65000e78ea44744595cf8dd503c5344874c6dfd086a2f37927221212b6059202ed4e688d4c37d31e0062746005498416dce853ddd
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Munduparambil Rajan, Praveen
By organisation
Department of Computer and Systems Sciences
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 15 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 23 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf